Apparatus, method, and computer program product for checking hypertext

ABSTRACT

A hypertext checking apparatus comprises: a hypertext database  21  which stores the information about a page and a link; an information collecting unit  11  which collects an information about the page and the link in the hypertext obtained from the hypertext database  21;  a condition detecting unit  13  which refers to the hypertext database  21  to detect a part including logically mismatched link; a candidate providing unit  12  that provides a correction candidate related to the parts detected by the condition detecting unit  13;  and a correction reflecting unit  14  which corrects the hypertext based on the part detected by the condition detecting unit  13  and the correction candidate provided by the correction providing unit  12.

[0001] This application is based on Japanese patent applicationNO.2002-302585, the content of which is incorporated hereinto byreference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to an apparatus, method andcomputer program product for checking a hypertext, and moreparticularly, to an apparatus, method and computer program product fordetecting part of an error in a link source description and arelationship between links in a hypertext.

[0004] 2. Description of the Related Art

[0005] In recent years, companies, organizations, and people have hadmany occasions to make the computerized information public on the siteof Internet. Most of information published on these sites arehypertexts.

[0006] There is disclosed a first example of the conventional technologyof checking a hypertext targeting a hypertext on Internet, in nonpatentliterature on a link checker “LinkScan™” produced by Elsop™ (ElectronicSoftware Publishing Corporation), at URL:http:/www.elsop.com/linkscan/on Internet, searched at Oct. 9, 2002. This is a tool that automaticallygoes around the hypertexts over the Internet to have a log recordedtherein on the occasion of an error. There are some types of such linkchecker including one type of the link checker that is adapted todiagnose a target online in accordance with the specified address of thetarget, and the other type of the link checker that is adapted todiagnose folder offline in accordance with the specified particularfolder in a hard disk.

[0007] There is disclosed a second example of the conventionaltechnology of detecting a physical mismatch in a link, in JapaneseNon-examined Patent Publication No. 2001-273185. The method in theconventional technology comprises the steps of: storing an address ofthe hypertext to be managed in a database; and checking whether there isa document at the stored address of the hypertext or not, thereby makingit possible to detect a physical mismatch in such as a dead link. Theabove conventional method further comprises the step of previouslyregistering, on a system, a keyword and image for identifying each ofdocuments in the database. In the conventional method, when the deadlink is detected, it is possible to search for a vanished page by asearch engine to then provide with a correction candidate.

[0008] There is a third example of the conventional technology of atypical system for checking a document including a document correctingsystem such as an auto-correcting function in Microsoft® Word producedby Microsoft Corporation. These document correcting systems are operableto detect an inappropriate expression such as an error of a declensional“Kana”, which is a kind of Japanese character, ending and a repeat of apostpositional particle of Japanese, and to then output a correctioncandidate.

[0009] A first problem to be solved is that, in the aforementioned firstand second example of the conventional technologies, only a physicalmismatched link can be detected, but a logically mismatched link can notbe detected, because of the fact that, in the aforementionedconventional technologies, the judgment whether there is a mismatch ornot is made based on only the result of the judgment whether an error isreturned from a server or not, when the connection to an address of ahypertext is gotten. The method of detecting a logically mismatch has nochoice but to rely on manual and visual confirmation on a browser atpresent, because no error occurs in case of the logically mismatch.

[0010] A second problem to be solved is that, in the aforementionedfirst and second example of the conventional technologies, it isimpossible to provide a correction candidate for the logically mismatchbut it is possible to provide a correction candidate for only thephysical mismatch. The reason for this problem is the similar to that ofthe above first problem. A third problem to be solved is that the manualand visual confirmation on the browser needs enormous cost. The reasonfor this problem is that a large scale of site, such as of a company,has hypertexts of between thousand and tens of thousands, and the numberof links between documents reaches to between tens of thousands andhundreds of thousand. The confirmation of whole of these links is notrealistic about viewpoints of time and cost. The confirmation on thebrowser is also apt to omit to check a hantom link and the like.

[0011] A fourth problem to be solved is that, in the aforementionedthird conventional technology, the logically mismatch, such as disunityin the link source descriptions, cannot be detected although theaudience is confused by the fact that the link source description hasthe difference expressions for the links to the same documents. Thereason of this problem is that the link source description including nounsuitable expression may be regarded as a normal.

SUMMARY OF THE INVENTION

[0012] It is therefore a first object of the present invention toprovide an apparatus, method, and computer program product for checkinga hypertext in which not only the physical mismatch but also logicalmismatch can be detected.

[0013] It is a second object of the present invention to provide anapparatus, method, and computer program product for checking a hypertextin which it is possible to provide an administrator with a correctioncandidate of not only the physical mismatch but also the logicalmismatch.

[0014] It is a third object of the present invention to provide anapparatus, method, and computer program product for checking a hypertextin which a cost of the mismatch check can be considerably reduced.

[0015] In accordance with an aspect of the present invention, there isprovided an apparatus for checking a hypertext, targeting a hypertextdatabase, and being capable of detecting at least one part of logicallymismatched link including: a part having a mismatch between a linksource description and contents on the link target page; a part having amismatch between a link source description and contents on the linktarget page that is caused by correcting contents in the link targetpage; a part causing disunity among a plurality of different link sourcedescriptions having the same link target page; apart causing disunity instyles among a plurality of different link source descriptions withinthe same page and around the pages; a part of link having no link sourcedescription; and a part in which all of the link source descriptions ina group of links forming a loop and corresponding to this group of linksare related to a same topic.

[0016] More specifically, a first hypertext checking apparatuscomprises: an information storing unit capable of storing thereininformation about a page and link in the hypertext; and a conditiondetecting unit for referring to said information storing unit to detectsome parts of logically mismatched link.

[0017] A second hypertext checking apparatus comprises: an informationcollecting unit for collecting information about a page and link in thehypertext; an information storing unit capable of storing therein saidinformation about the page and link; and a condition detecting unit forreferring to said information storing unit to detect some parts oflogically mismatched link.

[0018] A third hypertext checking apparatus comprises: theconstitutional elements of the first and second hypertext checkingapparatus; and a candidate providing unit for calculating a correctioncandidate concerning said parts detected by said condition detectingunit.

[0019] A fourth hypertext checking apparatus comprises: theconstitutional elements of the third hypertext checking apparatus; andan importance calculating unit for calculating and outputting importancevalue of the part detected by said condition detecting unit.

[0020] A fifth hypertext checking apparatus comprises: theconstitutional elements of the third and fourth hypertext checkingapparatus; and a correction reflecting unit for reflecting saidhypertext based on the part of the mismatched link detected by saidcondition detecting unit and the correction candidate calculated by saidcorrection providing unit.

[0021] A sixth hypertext checking apparatus comprises: theconstitutional elements of the fourth hypertext checking apparatus; anda total score calculating unit for calculating and outputting a totalscore concerning to said hypertext in accordance with at least a factoror a combination of a plurality of factors including the importancevalue calculated by said importance calculating unit, the number of saidparts detected by said condition detecting unit, and the rate of thenumber of said parts corresponding to the number of total links anddetected by said condition detecting unit.

[0022] A seventh hypertext checking apparatus comprises: theconstitutional elements of the first and second hypertext checkingapparatus; and an importance calculating unit for outputting importancevalue of the parts detected by said condition detecting unit.

[0023] An eighth hypertext checking apparatus comprises: theconstitutional elements of the seventh hypertext checking apparatus; anda total score calculating unit for calculating and outputting a totalscore concerning to said hypertext in accordance with at least a factoror a combination of a plurality of factors including; the importancevalue calculated by said importance calculating unit, the number of saidparts detected by said condition detecting unit, and the rate of thenumber of said parts corresponding to the number of total links anddetected by said condition detecting unit.

[0024] In the first, second, seventh, and eighth hypertext checkingapparatus, said condition detecting unit may be operated to group theinformation about said links by a predetermined conditions, and todetect the information about the links excluded from said groups.

[0025] In the first, second, seventh, and eighth hypertext checkingapparatus, said condition detecting unit may be operated to detect parthaving a mismatch between a link source description and contents on thelink target page. In this case, said condition detecting unit may beoperated to calculate an criteria score of the link based on at leastone of the criteria scores of the links including: (1) a first criteriascore calculated by comparing the link source descriptions of the linksfor the same link target page with each other; (2) a second criteriascore calculated by comparing the target pages of a plurality of linksrepresented by the same link source description with each other; (3) athird criteria score calculated by comparing the link target pages basedon a plurality of links for the same link target page and the same linksource description with each other; and (4) a fourth criteria scorecalculated by comparing the link source description and the link targetpage in the contents, and said condition detecting unit is operated todetect part with a high criteria score.

[0026] In the first, second, seventh, and eighth hypertext checkingapparatus, said condition detecting unit may be operated to detect parthaving a mismatch between a link source description and contents on thelink target page that is caused by correcting contents in the linktarget page.

[0027] In this case, said condition detecting unit may be operated tocalculate an criteria score of the link based on at least one of thecriteria scores of the links including: (1) a first criteria scorecalculated by comparing the link source descriptions of the links forthe same link target page with each other; (2) a second criteria scorecalculated by detecting at least a notice description including amovement notice description and an expiration notice description in thecontents of the link target page; and (3) a third criteria scorecalculated by comparing the description of period of validity describedin the contents of the link target page and the present date and time,and said condition detecting unit is operated to detect part with a highcriteria score.

[0028] In the first, second, seventh, and eighth hypertext checkingapparatus, said condition detecting unit may be operated to detect apart causing disunity among a plurality of different link sourcedescriptions having the same link target page.

[0029] In the first, second, seventh, and eighth hypertext checkingapparatus, said condition detecting unit may be operated to detect partcausing disunity in styles among a plurality of different link sourcedescriptions within a same page and peripheral pages.

[0030] In the third through sixth hypertext checking apparatus, saidcondition detecting unit may be operated to group the information aboutsaid links by a predetermined conditions, and to detect the informationabout particular links excluded from said groups, while said candidateproviding unit may be operated to obtain the correction candidate so asto uniform the information about said particular links with the otherright links.

[0031] In the third through sixth hypertext checking apparatus, saidcondition detecting unit may be operated to detect a part having amismatch between a link source description and contents on the linktarget page.

[0032] In this case, said condition detecting unit may be operated tocalculate an criteria score of the link based on at least one of thefollowing scores of the links including: (1) a first score calculated bycomparing the link source descriptions of the links for the same linktarget page with each other; (2) a second score calculated by comparingthe target pages of a plurality of links represented by the same linksource description with each other; (3) a third score calculated bycomparing the link target pages based on a plurality of links for thesame link target page and the same link source description with eachother; and (4) a fourth score calculated by comparing the link sourcedescription and the link target page in the contents, and said conditiondetecting unit being operated to detect part with a high criteria score,

[0033] said candidate providing unit being operated to specify at leasta sort of correction candidate including: (1) a correction candidate ofthe link source description calculated by comparing the link sourcedescriptions of the links for the same link target page with each other;(2) a correction candidate of the link source description calculated bycomparing the link target pages based on a plurality of links for thesame link source description with each other; (3) a correction candidateof the link source description calculated by comparing the link targetpages based on a plurality of links for the same link target page andthe same link source description with each other; and (4) a correctioncandidate of the link source description calculated by comparing thelink source description and the link target page in the contents.

[0034] In the third through sixth hypertext checking apparatus, saidcondition detecting unit may be operated to detect part having amismatch between a link source description and contents on the linktarget page that is caused by correcting contents in the link targetpage.

[0035] In this case, said condition detecting unit may be operated tocalculate an criteria score of the link based on at least one of thecriteria scores of the links including: (1) a first criteria scorecalculated by comparing the link source descriptions of the links forthe same link target page with each other; (2) a second criteria scorecalculated by detecting at least a notice description including amovement notice description and an expiration notice description in thecontents of the link target page; and (3) a third criteria scorecalculated by comparing the description of period of validity describedin the contents of the link target page and the present date and time,and said condition detecting unit is operated to detect part with a highcriteria score, said candidate providing unit being operated to specifyat least a sort of correction candidate including: (1) a correctioncandidate of the link source description calculated by comparing thelink source descriptions of the links for the same link target page witheach other; and (2) a correction candidate of the link sourcedescription calculated by extracting the information about a movementdestination from with the contents of the link target page.

[0036] In the third through sixth hypertext checking apparatus, saidcondition detecting unit may be operated to detect part causing disunityamong a plurality of different link source descriptions having the samelink target page, said candidate providing unit being operated tocalculate the correction candidate of the link source description bycomparing the link source descriptions of the links for the same linktarget page with each other.

[0037] In the third through sixth hypertext checking apparatus, saidcondition detecting unit may be operated to detect part causing disunityin styles among a plurality of different link source descriptions withinthe same page and around the pages, and said candidate providing unitbeing operated to calculate the correction candidate of the style of thelink source description by comparing the style of a plurality of linksource descriptions within the page including the detected parts andaround the pages.

[0038] In the second through sixth hypertext checking apparatus, saidinformation collecting unit may be operated to repeatedly collect theinformation about the page and link in the hypertext, to further storesaid information about the page and link at a plurality of times in saidinformation storing unit. In this case, said condition detecting unitmay be operated to refer to said information storing unit to calculate achange, in accordance with time, in the number of targeted linkscorresponding to a page corrected in the contents, and a change in sortsof link source description with time, so as to detect part in which amismatch between the link source description and the contents of thelink target page is detected.

[0039] In the first through eighth hypertext checking apparatus, saidcondition detecting unit may be operated to detect a link having no linksource description.

[0040] In the first through eighth hypertext checking apparatus, saidcondition detecting unit may be operated to detect a link including alink having no character string and an image described as the linksource description and a link having a character string and an imagedescribed as the link source description with an inconspicuous color anda size.

[0041] In the first through eighth hypertext checking apparatus, saidcondition detecting unit may be operated to detect part in which all ofthe link source descriptions in a group of links forming a loop andcorresponding to this group of links are related to the same topic.

[0042] In the fourth through seventh hypertext checking apparatus, saidimportance calculating unit may be operated to calculate importancevalue based on at least a factor or a combination of a plurality offactors including: (1) a sort of errors and unsuitability of thedetected parts; (2) accuracy of errors and unsuitability of the detectedparts; (3) the number of targeted links of the page including thedetected parts; (4) record for frequency of access by user to the pageincluding the detected parts; and (5) a stratification level in thehypertext of the page including the detected parts, while saidimportance calculating unit may be operated to calculate the importancevalue of the detected parts, and to control, in accordance with saidlevel of importance value, output condition for the detected partsincluding the number of outputting records, and a method of outputtingthe records.

[0043] In the second through eighth hypertext checking apparatus, saidinformation collecting unit may be operated to extract the characterstrings corresponding to said link source description by characterrecognition when the link source description is an image, and toresister the extracted character strings as said information about pageand link on said information storing unit.

[0044] The first through eighth hypertext checking apparatus may targeta hypertext on a Web site.

[0045] In accordance with another aspect of the present invention, thereis provided a first hypertext checking method comprising the steps of:(a) determining conditions for the check of a hypertext database so asto detect parts including: part of having an error in a link sourcedescription; part of having an error in a relationship between links;part of having unstability in a link source description; and part ofhaving unstability a relationship between links; and (b) displaying, ona display screen, a list having three items including: (1) a link sourcedescription; (2) identification information about a link source page;and (3) identification information about a link target page.

[0046] In the above hypertext checking method, said step (b) may includethe step of displaying a list sorted by each of three items including:(1) a link source description; (2) identification information about alink source page; and (3) identification information about a link targetpage.

[0047] The above hypertext checking method may further comprise thesteps of: (b) displaying, on a display screen, a list having three itemsincluding: (1) a link source description; (2) identification informationabout a link source page; and (3) identification information about alink target page; (c) allowing an operator to correct said items (1),(2), and (3) on said display screen; and (d) reflecting all of saiditems corrected in said step (c) to correct said hypertext database.

[0048] The above hypertext checking method may further comprise the stepof specifying the targeted hypertext database.

[0049] A second hypertext checking method comprising the steps of: (a)collecting information about a page and link in a Web site; (b)referring to the result of said step (a) to detect some parts oflogically mismatched link; (c) calculating importance value of the partdetected in said step (b) and calculating a total score concerning to aWeb site; (d) performing periodically said steps (a) to (c) for a Website specified as a target; and (e) informing about a change with timein said total score concerning to the specified Web site.

[0050] A third hypertext checking method comprising the steps of: (a)collecting information about a page and link in a Web site; (b)referring to the result of said step (a) to detect some parts oflogically mismatched link; (c) calculating importance value of the partdetected in said step (b) and calculating a total score concerning to aWeb site; (d) performing periodically said steps (a) to (c) for a Website specified as a target; and (e) putting out an alert when said totalscore concerning to the specified Web site and said importance value ofthe detected part are fulfilled with a predetermined condition.

[0051] A fourth hypertext checking method comprising the steps of: (a)collecting information about a page and link in a Web site; (b)referring to the result of said step (a) to detect some parts oflogically mismatched link; (c) calculating importance value of the partdetected in said step (b) and calculating a total score concerning to aWeb site; (d) performing periodically said steps (a) to (c) for aplurality of Web sites each specified as a target; and (e) outputting aresult of a ranking of said total scores of the specified plural Websites in order in level.

[0052] In accordance with the first through eighth hypertext checkingapparatus, the processes including the steps of grouping the linkinformation by particular conditions, and detecting a particular linkexcluded from the group as a mismatched link, are performed so as tohave the condition-detecting unit detect the logically mismatched link,thereby making it possible to achieve the first object of the presentinvention.

[0053] In accordance with the third though sixth hypertext checkingapparatus, the candidate providing unit is operated to perform theprocess of calculating the correction candidate to harmonize the linkinformation of the particular link with the link information of largemajority of the other appropriate links, thereby making it possible toachieve the second object of the present invention.

[0054] In accordance with the first though sixth hypertext checkingapparatus, the logically mismatch is automatically detected by thecondition detecting unit. In accordance with the third though sixthhypertext checking apparatus, the correction candidate is automaticallycalculated by the correction candidate providing unit. In fifthhypertext checking apparatus, the logically mismatched parts areautomatically corrected by the correction reflecting unit. Therefore,the third object of the present invention can be achieved.

BRIEF DESCRIPTION OF THE DRAWINGS

[0055] The present invention and many of the advantages thereof will bebetter understood from the following detailed description whenconsidered in connection with the accompanying drawings, wherein:

[0056]FIG. 1 is a block diagram of a first embodiment of the hypertextchecking apparatus according to the present invention;

[0057]FIG. 2A is a diagram showing examples of a document described inthe format of a hypertext on which some links are specified;

[0058]FIG. 2B is a diagram showing examples of a display screen of thedocument viewed through a browser;

[0059]FIG. 3 is a diagram showing one example of a logically mismatchdue to an error link;

[0060]FIG. 4A is a diagram showing one example of a logically mismatchdue to an expiration period link;

[0061]FIG. 4B is a diagram showing one example of a logically mismatchdue to an expiration period link;

[0062]FIG. 5 is a diagram showing one example of a logically mismatchdue to disunity in link source descriptions;

[0063]FIG. 6A is a diagram showing one example of a logically mismatchdue to disunity in styles of link source descriptions;

[0064]FIG. 6B is a diagram showing one example of a logically mismatchdue to disunity in styles of link source descriptions;

[0065]FIG. 7A is a diagram showing one example of a logically mismatchdue to a phantom link;

[0066]FIG. 7B is a diagram showing one example of a logically mismatchdue to a phantom link;

[0067]FIG. 8 is a diagram showing one example of a logically mismatchdue to a loop link;

[0068]FIG. 9 is a table of an example of the link information stored inan information storing unit;

[0069]FIG. 10 is a flowchart showing the operation of the firstembodiment of the hypertext checking apparatus according to the presentinvention shown in FIG. 1;

[0070]FIG. 11 is a diagram of an example of a display screen for settinga document collection condition in the first embodiment of the hypertextchecking apparatus according to the present invention;

[0071]FIG. 12 is a diagram of an example of a display screen for settingan extraction condition for the mismatched link in the first embodimentof the hypertext checking apparatus according to the present invention;

[0072]FIG. 13 is a diagram of an example of a display screen of a listof results of the extracted mismatched link in the first embodiment ofthe hypertext checking apparatus according to the present invention;

[0073]FIG. 14 is a flowchart showing the process of extracting the errorlink in the first embodiment of the hypertext checking apparatusaccording to the present invention;

[0074]FIGS. 15A to 15D are tables of examples of the link informationextracted in respective steps in the process of extracting the errorlinks shown in FIG. 14 in the first embodiment of the hypertext checkingapparatus according to the present invention;

[0075]FIG. 16 is a flowchart showing the process of extracting theexpiration period link in the first embodiment of the hypertext checkingapparatus according to the present invention;

[0076]FIG. 17 is a flowchart showing the process of extracting thedisunity in the link source descriptions in the first embodiment of thehypertext checking apparatus according to the present invention;

[0077]FIG. 18 is a table of an example of the link information in thestep of the process of extracting the disunity in the link sourcedescriptions shown in FIG. 17 in the first embodiment of the hypertextchecking apparatus according to the present invention;

[0078]FIG. 19 is a flowchart showing the process of extracting thedisunity in the styles of the link source pages in the first embodimentof the hypertext checking apparatus according to the present invention;

[0079]FIG. 20 is a table of an example of the link information in thestep of the process of extracting the disunity in the styles of the linksource pages shown in FIG. 19 in the first embodiment of the hypertextchecking apparatus according to the present invention;

[0080]FIG. 21 is a flowchart showing the process of extracting thephantom link in the first embodiment of the hypertext checking apparatusaccording to the present invention;

[0081]FIG. 22 is a flowchart showing the process of extracting the looplink in the first embodiment of the hypertext checking apparatusaccording to the present invention;

[0082]FIG. 23 is a flowchart showing the process of extracting the linkvaried with time in the link information in the first embodiment of thehypertext checking apparatus according to the present invention;

[0083]FIG. 24 is a table of an example of the link information extractedin the step of the process of extracting the links varied with time inthe link information shown in FIG. 23 in the first embodiment of thehypertext checking apparatus according to the present invention;

[0084]FIG. 25 is a block diagram of a second preferred embodiment of thehypertext checking apparatus according to the present invention;

[0085]FIG. 26 is a flowchart showing the operations of the secondpreferred embodiment of the hypertext checking apparatus according tothe present invention shown in FIG. 25;

[0086]FIG. 27 is a diagram showing an example of a display screen of alist of results of the extracted mismatched link in the second preferredembodiment of the hypertext checking apparatus according to the presentinvention;

[0087]FIG. 28 is a block diagram of a third preferred embodiment of thehypertext checking apparatus according to the present invention;

[0088]FIG. 29 is a flowchart showing the operations of the thirdpreferred embodiment of the hypertext checking apparatus according tothe present invention shown in FIG. 28;

[0089]FIG. 30 is a diagram showing an example of a display screen of aline chart of a change with time in a total score in the third preferredembodiment of the hypertext checking apparatus according to the presentinvention;

[0090]FIG. 31 is a diagram showing an example of a display screen of abar graph of a site ranking in the total score in the third preferredembodiment of the hypertext checking apparatus according to the presentinvention;

[0091]FIG. 32 is a block diagram of a fourth, fifth, and sixth preferredembodiment of a system comprising a hypertext checking program accordingto the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0092] The hypertext means a set of documents structured with ahyperlink or a link and has a structure including links provided betweenthe documents. Typical one example of the hypertext is a WWW (World WideWeb). The WWW is a collection of the hypertexts described in a HTML(Hyper Text Markup Language) format, such as a document shown in FIG.2A. The links and anchor character strings are marked with <A> tag. Thedocument 101 shown in FIG. 2A has href attributes of the <A> tagsindicative of identification information of the documents 102, 103, and104. The identification information of the document is generallyreferred to as “a URL” or “a web address” in the WWW, but will be onlyreferred to as simply “an address” in the present invention. Thecharacter strings “GX0011”, “GX0012”, and “GX0013” interposed betweenthe <A> tags are generally referred to as “anchor character strings”.Because the image file is often interposed between the <A> tags, theimage as well as the character string interposed between the <A> tagswill be referred to as “a link source description” in the presentinvention and treated as the same.

[0093] The attribute of the <A> tag described in the document 101 shownin FIG. 2A has not only the href attribute but also a target attribute,a style attribute, or the like. The target attribute serves as anattribute for specifying which types of window is used to displaythereon a document of a link target or a link destination. The styleattribute serves as an attribute for specifying what size or whichcolors of a font, or highlighted representation are used to display thelink source description of the link. When the document 101 shown in FIG.2A is viewed with a browser, the document 101 may be displayed on thedisplay screen as shown in FIG. 2B. The document 101 has links 201, 202,and 203 for the documents 102, 103, and 104, respectively, and havinglink source descriptions “GX0011”, “GX0012”, and “GX0013”, respectively.The document 102 may be accessed by way of the link 201 when the linksource description “GX0011” in the document 101 is clicked. Similarly,the documents 103 and 104 may be accessed by way of the links 202 and203, respectively, when the link source descriptions “GX0012” and“GX0013”, respectively, in the document 101 are clicked.

[0094] Although the WWW has been explained above as typical examples ofthe hypertext, the present invention is not limited to the object to theWWW. The hypertext may be described with any languages including notonly the HTML but also, for example, an XML (Extensible MarkupLanguage), an SGML (Standard Generalized Markup Language), and so on.

[0095] In order to avoid any confusion in term “user”, a person whovisits a company, organization, or personal site to browse the hypertextis referred to as an “audience”, while a person who utilizes the presentinvention to administer the hypertext is referred to as an“administrator”, in the present invention.

[0096] The administration of the hypertext however becomes complex anddifficult as amount of information published on the Internet increases.Therefore the rate of mismatched links, such as a link inappropriate forthe link source description, or a link mistaken in the link target,increases. The mismatched link may be roughly classified into two typesincluding a physical and logical mismatch.

[0097] The physical mismatch means a physically impossible mismatch toaccess the link target, in cases where there is no text of the linktarget, and where a server of the link target is down, for example. Whenthe documents having these physical mismatches are accessed, the serveror the client is operated to reply an error message.

[0098] In the event of the logical mismatch, it may be physicallypossible to access the link target, but there is a logical error made inthe link of the pages describing thereon such as wrong productinformation, or the expired campaign information. When a documentincluding the logically mismatched part is accessed, the server is notoperated to replay any error message, as a text in the link targetexists as well as the server in the link target runs in good order. Theaudience is, however, sometimes confused by an error link, as well asthe administrator sometimes suffers from responses to the applicationsfor the expired campaign applied by the audience. The logical mismatchestherefore have significant implications no less than that of thephysical mismatch. There are some examples of the logical mismatchincluding, but are not limited to, (1) putting a link to a wrongdestination, (2) putting a link to an expired information, (3) disunityin the link source descriptions, (4) disunity in the styles of the linksource descriptions, (5) a phantom link, and (6) a loop link, and so on.Example of each logical mismatches is described in detail in thefollowings with reference to the drawings.

[0099] (1) Putting a Link to a Wrong Destination

[0100] As shown in FIG. 3, “putting a link to a wrong destination” meansa mismatch caused between the contents expected from the link sourcedescription and the practical contents in the text of the link target.In FIG. 3, the link source descriptions of all of the links 211, 212,213, and 214 are same in the description “GX0011”. All of the linktargets of the documents 111, 112, and 113 indicate the same document116 which is representative of the product introduction of “GX0011”, butthe link target of the document 114 indicates the wrong document 117which is representative of the product introduction of “GX0012”.Therefore the audience can access the document 116 for the introductioninformation of “GX0011” according to expectation when the audiencebrowses the documents 111, 112, and 113, but the audience cannot accessthe document 116 when the audience browses the wrong document 114against expectation. The audience who browses the document 114 looksanother wrong product introduction which is different from that expectedfrom the link source description “GX0011”. This will cause a confusionto the audience.

[0101] Moreover, all of the destinations of the links 211, 212, 213 and215 indicate the same document 116, but the link source description ofthe only link 215 is wrongly described as “GX0012”. Therefore, theaudience who browses the document 115 looks another product introductionwhich is different from that expected from the link source description“GX0012”. This will cause a confusion to the audience.

[0102] Furthermore, the document 115 has two of links 215 and 216 whichare put to the documents 116 and 117, respectively. Both of the links215 and 216, however, have the same link source description “GX0012”.Therefore, the audience who browses the document 115 finds the differentcontents of the documents 116 and 117 in spite of the fact that theaudience selects the same link source description “GX0012”.

[0103] In this embodiment, the example of putting the link to the wrongdestination described above includes the error link to the productinformation, but is not limited to, and may further include a mistake ofputting a link between an English document and a Japanese document, anerror link for a link to a completely unrelated page, and so forth.

[0104] (2) Putting a Link to an Expired Information

[0105] As shown in FIG. 4, “putting a link to an expired information”means a mismatch caused by a remaining expired campaign, or a remainingclosed service. FIG. 4A shows a group of the documents as of Aug. 15,2002, while FIG. 4B shows a group of the documents as of Sep. 15, 2002.

[0106] In FIG. 4A, it is announced, in the document 125, that a campaignis conducted for a limited time between Jul. 20, 2002 and Aug. 31, 2002.The documents 121, 122, 123 and 124 have the same link sourcedescription “free admission fee” for putting links 221, 222, 223 and224, respectively, to the document 125 having contents of the campaign.

[0107] Meanwhile, in FIG. 4B, it is announced, in the document 125, thatthe campaign is terminated because the campaign is expired. In thedocuments 122, 122 and 123, therefore, the link for the contents of thedocument 125 for the campaign is already eliminated. In the document124, however, the link for the contents of the document 125 for theexpired campaign is not eliminated yet, therefore the link 224 to thedocument 125 and the link source description “free admission fee” isstill left. Thus, the audience who browses the document 124 cannot beprovided with a service shown in the link source description “freeadmission fee” against his/her expectation.

[0108] In this embodiment, the example of putting a link to an expiredinformation described above includes the link for the expired campaign,but is not limited to, and may further include a mismatch caused bytransferring an initial document from an original address to anotheraddress and replacing this initial document with another document at theoriginal address. Furthermore, an original period may be unlimited. Thelink for the expired information in this embodiment may further includea mismatch caused by abandoning the service in the link target, orclosing a site due to some reasons. The case when the document iseliminated due to the expiration, however, is included in the physicalmismatch because an error occurs when accessing the document. Theexpired link may be considered as a type of the error link, but in thepresent invention, the link for the link source destination which isexpired is especially distinguished from the error link and specified asthe expired link.

[0109] (3) Disunity in Link Source Descriptions

[0110] As shown in FIG. 5, the disunity in the link source descriptionsmeans a mismatch in case when there is a fluctuation caused by thedisunited link source descriptions. In FIG. 5, the documents 131, 132,133, and 134 put the links 231, 232, 233, and 234 to the document 135.All of the link source descriptions of the links 231, 232, and 233 aresame in the description “GXSeries”, except for the link sourcedescription of the link 234 which is “gX Series”. Therefore, theaudience who browses the document 134 misunderstands that the linksource description “gX Series” different from that of “GX Series”exists, and then follows the link 234.

[0111] In this embodiment, the example of the disunity in the linksource descriptions described above includes the fluctuation between acapital and small letter in the link source description, but is notlimited to, and may further include: a fluctuation between an Englishand “katakana”, a kind of Japanese character, description; a fluctuationin different “katakana” descriptions, such as “vaiorin” and “baiorin”,both corresponding to “violin” in English; a fluctuation between a“katakana” and “hiragana”, another kind of Japanese character,description; a fluctuation in vague or fuzzy and fuzzy similarexpression, such as “event information” and “seminar information”; and aspelling error such as “Series” and “Selies”.

[0112] (4) Disunity in the Style of the Link Source Description

[0113] As shown in FIG. 6, the disunity in the style of the link sourcedescription means a mismatch in different views of the link, ordifferent effects at the click on a link button, for example, due to thedifferent style or target attribute. In FIG. 6A, the document 141 hasfour links 241, 242, 243, and 244, in three of which the targetattribute is specified as “_blank” so as to open a pop-up window todisplay the page of the link target thereon. Therefore, the audience whobrowses the document 141 as shown in FIG. 6B may browse the documents142, 143, and 144 of the link targets of the links 241, 242, and 243 oneafter another, while opening the document 141 displayed on the screen.The display of the page of the link target on the pop-up window is oftenconvenient to particularly browse the collection of the links, in whichthe audience may browse one after another some documents of thedifferent link targets while browsing the original document of thecollection of the links. Meanwhile, no target attribute is specified inthe link 244, thereby causing the documents to turn at the click on alink button. Therefore, because the documents turn at the click on thelink 244, the audience should look for a link to return the originaldocument 141, or use a return button of the browser.

[0114] In this embodiment, the example of the disunity in the style ofthe link source description described above includes the disunity in thetarget attribute in the document, but is not limited to, and may furtherinclude a mismatch in the different color of some links, and in thedifferent highlighted representation of the some links, due to thedisunity in the style attribute.

[0115] (5) A Phantom Link

[0116] As shown in FIG. 7, the phantom link means a mismatch in casewhere the audience browses a document but cannot find out about a linkin the document in spite of the fact that the link is described in theHTML description for the document. In FIG. 7A, there is an<A> tag forspecifying the link target as “HIDDEN_URL” which interposed between acharacter string, such as “stock status of GX Series”, indicative of aheader, and a <TABLE> tag indicative of a table. There is, however, nocharacter string or image interposed between these <A> tags. Therefore,the audience cannot notice that there is a link interposed between theheader and the table in the view as shown in FIG. 7B, when the document151 is browsed by the browser. It is easy for the crawler to search forand follow such links, but it is difficult for the administrator to findthese links. Suppose that the link target “HIDDEN_URL” is indicative ofa confidential file such as a customer list, the information stored inthe confidential file can be easily acquired by the crawler, while thereis a danger of causing the trouble which is that a human being can notnotice this leakage.

[0117] In this embodiment, the phantom link described above includes nolink source description, but is not limited to, and may further includea mismatch in case where it is difficult to visually recognize the linkthrough the browser, because of the fact that the link sourcedescription is described as a transparent image, a considerableinfinitesimally small image or character, or an image or character whichis the same color as that of a background. Even if it is possible tovisually see the link source description, it is impossible todistinguish the link from the body text, as the link style of the linksource description is the same as that of the body text as well as thereis no highlighted representation. This case, therefore, is included inthe phantom link because the link cannot be visually confirmed on thedisplay screen of the browser.

[0118] (6) A Loop Link

[0119] As shown in FIG. 8, the loop link means a mismatch in case wherethe audience follows links for a certain information one after another,but thereby resulting in the return to the original page. In FIG. 8, thedocument 161 has a link 261 to the document 162 with the link sourcedescription “Information about a present”. Furthermore, the document 162has a link 262 to the document 163 with the link description “Digitalcamera present”. Moreover, the document 163 has a link 263 to thedocument 161 with the link source description “Click here to a present”.When the audience who browses the document 161 may be interested in asentence “Information about a present” in the document 161, the audiencewill follow the link 261. However, the audience may find that there isalso the link 262 having the link source description “Digital camerapresent” in the document 162. Therefore, the audience may expect moreinformation about the present to be followed by the next link, and thenmay access the document 163. However, the document 163 has also the linksource description “Click here to a present”. Therefore, the audiencemay intend to acquire desired information and then follow the link 263.After all, the link 263 will be followed to the original document 161.The audience may confuse where he/her can find the right information.Thus, the loop link causes a problem that the audience will wanderthrough documents without any desired information.

[0120] First Preferred Embodiment

[0121] Referring now to FIG. 1 of the drawings, there is shown a firstpreferred embodiment of the hypertext checking apparatus according tothe present invention.

[0122] Referring now to FIG. 1 of the drawings, the first embodiment ofthe hypertext checking apparatus according to the present inventionincludes a data processing unit 1 operated under program control, astorage device 2 capable of storing information, an input unit 3, suchas a keyboard, and an output device 4, such as a displaying unit, aprinter, and so on.

[0123] The data processing unit 1 includes an information collectingunit 11, a candidate providing unit 12, a condition detecting unit 13,and a correction reflecting unit 14.

[0124] The storage device 2 includes a hypertext database 21 and aninformation storing unit 22.

[0125] The information collecting unit 11 is designed to fetch documentsfrom the hypertext database 21 included in the storage device 2, toretrieve link information, and to store the link information in theinformation storing unit 22. In this embodiment, the link informationmay include some items such as an address of the link source page, anaddress of the link target page, a link source description, a targetattribute, a style attribute, and so on. The information storing unit 22may record thereon a body of the document, an updated date, a date andtime of acquisition, and a condition when the document is acquired, suchas an error or success, in addition to the link information.

[0126] The condition detecting unit 13 is designed to group the linksstored in the information storing unit 22 in accordance with the linkinformation, and to extract a particular link among the links grouped ina same group as a mismatched link, from the information storing unit 22.

[0127] The candidate providing unit 12 is designed to provide acorrection candidate corresponding to the link which is extracted as themismatched link by the condition detecting unit 13. In this embodiment,the correction candidate includes information about: which of the itemsof the link information of the mismatched link should be corrected, andhow to be corrected. The candidate providing unit 12 outputs thecorrection candidate to the correction reflecting unit 14.

[0128] The correction reflecting unit 14 is designed to allow theadministrator to confirm the outputted mismatched link and thecorrection candidate so as to reflect the confirmed result to thehypertext database 21.

[0129] The hypertext database 21 is capable of storing therein a set ofhypertexts included in targeted sites to be inspected. The local storagedevice 2 does not need to include the entire hypertext database 21, andsome parts of the hypertext database 21 may be distributed among anetwork, like that a group of hypertexts are distributed among anInternet.

[0130] The information storing unit 22 is capable of storing therein aninformation about links included in each documents in the hypertextdatabase 21. FIG. 9 shows an example of the link information. Forexample, the link information included in the document 101 shown inFIGS. 2A and 2B is illustrated in FIG. 9. It will be understood fromFIG. 9 that the document 101 has: a link 201 which is linked to thedocument 102 by way of a link source description “GX0011”; a targetattribute of which is designated by “_blank”; and a style attribute ofwhich is designated by “st01”. Although the link source description isdescribed as a text format in this embodiment, the link sourcedescription may be designated by an address of the specified image filewhen the link source description is specified as an image. Furthermore,there may be provided a character recognition module. The characterrecognition module may be executed upon the image file so as to extracta text embedded in the image and to store the extracted text in theinformation storing unit 22.

[0131] The operation of the hypertext checking apparatus of the firstembodiment will be described in the followings with reference to FIGS.1, and 9 to 13.

[0132] Firstly, the information collecting unit 11 is operated to readout the document from the hypertext database 21 based on the collectioncondition setting inputted by the input unit 3 (the step S1 in FIG. 10).In this embodiment, the document may be accessed by way of a HTTP (HyperText Transfer Protocol) when the hypertext database 21 is WWW (WorldWide Web). Conventionally, such function has been implemented with a Webbrowser, such as an IE (Internet Explorer produced by MicrosoftCorporation) or Web search engines of a robot type, so-called a crawleror a spider.

[0133] There is shown in FIG. 11 a display screen of a setting for thecollection when the hypertext database 21 is WWW. As shown in FIG. 11,this display screen is designed to allow the user to specify: a domainname of the site for an analysis target; a target number of pages fordocuments to be collected; a file extension of the target document; atime interval between accesses to the server; a retry count for failurein collection; a timeout duration for the collection; and a depth of ahierarchy of the recursion when the information are recursivelycollected by following links. In FIG. 11, the display screen furtherincludes an execute button which is operated to initiate the collectionof the hypertexts.

[0134] Next, the HTML descriptions of the collected documents areanalyzed by the information collecting unit 11, so that the linkinformation are extracted as shown in FIG. 9 and then stored in theinformation storing unit 22 (the step S2 in FIG. 10).

[0135] The condition detecting unit 13 is then operated to extract thelink which fulfills the extraction condition as the mismatched link fromthe information storing unit 22 based on the extraction conditionsinputted by the input unit 3 (the step S3 in FIG. 10).

[0136] There is shown in FIG. 12 a display screen of a setting for theextraction conditions. As shown in FIG. 12, the display screen isdesigned to allow the user to specify which kinds of mismatched links,such as a dead link, i.e., a physical mismatched link, an error link, alink for expired information, disunity in link source descriptions,disunity in the styles of link source descriptions, a phantom link, anda loop link, is to be extracted. When the link for a particular addressis already proved as the mismatched link, this address can be inputtedto a “particular URL” column as shown in FIG. 12, so that the linkincluding the link target having the inputted address can also beextracted. When too many mismatched links are extracted, the number ofrecords of mismatched links can be specified by limiting the number ofrecords to be displayed on a display screen. There is also provided anexecute button for allowing the user to issue instruction to start theextraction of the mismatched link.

[0137] The extraction of the dead link among some kinds of themismatched links can be realized by the aforesaid conventional method,thereby omitting the descriptions in this embodiment. The method ofextracting the link having a particular URL for a link source is obviousto those skilled in the art, thereby also omitting the descriptions inthis embodiment. The description of the method of extracting remaininglogically mismatched links will be described in the followings.

[0138] The candidate providing unit 12 is then operated to provide acorrection candidate so as to eliminate the mismatch in the linkextracted as the mismatched link by the condition detecting unit 13 (thestep S4 in FIG. 10), and to output a list of the results on a displayscreen (the step S5 in FIG. 10).

[0139] There is shown in FIG. 13 an example of the display screen of thelist of the results of extracted mismatched link. The list of theresults has a plurality of items such as kinds of mismatched links, acorrection candidate, a link ID, a link source, a link target, a linksource description, a target attribute, and a style attribute. As shownin FIG. 13, the links are divided into groups such that the links havingthe same “link target” and “link source description” are grouped in asame group. The grouped links are respectively given kinds of mismatchedlink and correction candidates and then displayed on the display screen.

[0140] When the link source address or the link target address isclicked, the corresponding document can be accessed. The correctioncandidate outputted by the system is indicated in the column of the“correction candidate”. The column of the “correction candidate” has twosections divided by a colon “:”, one of which includes items of the linkinformation to be corrected and the other of which includes informationabout how to correct. For example, the representation “link: delete”means that the link should be deleted. The representation “link sourcedescription: “What's New”” means that the link source description shouldbe changed to “What's New”. This correction candidate may be re-writtenby the administrator after confirming.

[0141] The administrator can then confirm the mismatched link and thecorrection candidate outputted on the list (the step S6 in FIG. 10).Referring to FIG. 13, the links having the same link target and linksource description are grouped. Therefore, once the administratorconfirms a representative example of each of the mismatched links, theadministrator does not need to confirm all of the links. For example, itis understood from the list of the results shown in FIG. 13 that all ofthe links having the link IDs 271 to 274 have the same link targetindicative of the document 175, the same link source descriptionindicative of “ox campaign now underway”, the kind of mismatched linkindicative of the link for the expired information, and the correctioncandidate indicative of “link: delete”. Therefore, it is understood thatall of the links of the link IDs 271 to 274 should be deleted. All theadministrator has to do is to access the document 171 to confirm thevalidity of the mismatched link and correction candidate of the link271. The administrator dose not have to confirm all of the remaininglinks 272 to 274. Therefore, it is possible to cut a cost of theconfirmation.

[0142] When there are a plurality of correction candidates, theadministrator may be provided with a plurality of correction candidates,such as “link target: document 177 OR link source description: productB” in FIG. 13, which are partitioned by “OR”. In this case, theadministrator may select a necessary correction candidate based on theresult of the confirmation. When the administrator judges that thecorrection candidate is wrong in accordance with the result of theconfirmation, the administrator may correct this error. For example, thecorrection candidate of the links 278 and 279 are indicative of “linksource description: What's New” in FIG. 13. The correction candidate canbe changed to “link target: document 180”, if the administratorconsiders that it is appropriate that the link target address should bechanged to the document 180. When the administrator judges that thecorrection should not be done, the column of the correction candidatemay be brought into a blank, thereby making it possible to cancel thecorrection in the following step.

[0143] When the administrator operates the button of “reflectcorrection” shown in FIG. 13, the correction reflecting unit 14 isoperated to correct each of the documents in the hypertext database 21in accordance with the correction candidates confirmed by theadministrator (the step S7 in FIG. 10). When there are a plurality ofcorrection candidates which are still connected with each other by “OR”at this stage, only the first correction candidate may be reflected.

[0144] The display screen of the list of the results further includeslinks “sort” at the items of the link source, the link target, and thelink source description, as shown in FIG. 13. These links are adapted tosort records of the result of extraction by using each item as the sortkey. For example, in response to a click of the link “sort” of the item“link source”, the records of the result of extraction can be sorted bythe link source document. Therefore, it is possible to grasp a tendencyfor each kind of the mismatched links to occur, for this reason, it isusable to correct the mismatched link by hands. In response to a clickof the link “sort” of the item “link target”, the records of the resultof extraction can be sorted by the link target document. Therefore, itis possible to grasp a situation in occurrence of the mismatched link ina particular document, for this reason, the mismatched link caused to animportant document, such as a document inundated with accesses, can beinvestigated. In response to a click of the link “sort” of the item“link source description”, the records of the result of extraction canbe sorted by the link source description. Therefore, it is possible tograsp a tendency for each kind of the link source description to causethe mismatch, for this reason, the suitability of the expression for thelink source description can be investigated.

[0145] Although it is described in this embodiment that theadministrator corrects the link source description, the link target, andso on, in the column of the “correction candidate” displayed on thedisplay screen of the list of the results in FIG. 13 is described, it isnot limited to that embodiment. The administrator may directly re-writethe records in the columns such as “link source”, the ““link target”,and the “link source description” on the display screen. Further,although it is described in this embodiment that the display screen ofthe setting for the collection of the hypertexts and the display screenof the setting for the extraction conditions are separately provided, asingle display may be provided for setting for the collection of thehypertexts and setting for the extraction conditions at the time ofstarting the analysis in another embodiment. In this case, steps S1 toS5 shown in FIG. 10 may be automatically performed. The presentinvention is not limited to the embodiments described above.

[0146] Furthermore, although it is described in this embodiment that theadministrator confirms the outputted mismatched link and the correctioncandidate in the step S6, the step S6 may be omitted and the rest of thesteps, steps S1 to S7, may be automatically performed in anotherembodiment. The present invention is not limited to the embodimentsdescribed above.

[0147] Furthermore, although it is described in this embodiment that theadministrator decides the timing to start the analysis, it is notlimited to that embodiment. In another embodiment, there may be provideda method having the steps of: previously setting the collection andextraction conditions; automatically performing the steps S1 to S5 atfixed intervals; and notifying the administrator of the obtained resultby an electronic mail or the like. The present invention is not limitedto the embodiments described above.

[0148] An Embodiment of the Detection of the Error Link

[0149] The operations of the condition detecting unit 13 and thecandidate providing unit 12 will be described in detail in thefollowings, with reference to FIGS. 3, 14 and 15A to 15D. In thisembodiment, the information storing unit 22 is capable of storing thelink information about the group of documents shown in FIG. 3.

[0150] Firstly, the condition detecting unit 13 is operated to read outthe link information from the information storing unit 22 to divide thelinks into some groups in accordance with the link information. Thecondition detecting unit 13 divides links having the same link sourcedescription into a same group. Then, the condition detecting unit 13further divides the links which is divided in the same group, having thesame link target into a same sub-group. Then, the condition detectingunit 13 extracts the links which has the different link target. Thecondition detecting unit 13 is further operated to give an criteriascore to each of the links in accordance with the number of linksincluded in the sub-group (the step T11 in FIG. 14).

[0151]FIG. 15A shows an example of the links extracted and the criteriascores given in the step T11. It can be understood from FIG. 15A thatthe links 211, 212, 213, and 214 are grouped as these links have a samelink source description “GX0011”, while the links 215, and 216 aregrouped as these links have a same link source description “GX0012”. Thethree links 211, 212 and 213 in the group having the link sourcedescription “GX0011” are further sub-grouped as these links have a samelink target “document 116”, while the link 214 is grouped into asub-group having the link target “document 117”. The link 215 in thegroup having the link source description “GX0012” is grouped into asub-group having the link target “document 116”, while the link 216 isgrouped into a sub-group having the link target “document 117”.

[0152] The method of giving the criteria score includes the steps of:setting the criteria score for each of the groups to “1”; setting thecriteria score for each of the sub-groups to a value which is obtainedby distributing the criteria score into the number in inverse proportionto the number of links in the sub-groups, and setting the criteria scorefor each of the links to a value which is obtained by dividing thecriteria score of each of the sub-groups equally into the number of thelinks in the sub-groups.

[0153] For example, as shown in FIG. 15A, the group of the link sourcedescription “GX0011” is given the criteria score “1”. When the criteriascore is distributed into the number in inverse proportion to the numberof the links in the sub-group, the sub-group of the link target address“document 116” is given the criteria score “1/4”, while the sub-group ofthe link target address “document 117” is given the criteria score“3/4”. The criteria score of the sub-group “1/4” is divided equally intothree links 211, 212, and 213, thereby giving the criteria score foreach of the links 211, 212, and 213 “1/12”. Similarly, each of the links215 and 216 is given the criteria score “1/2 ”.

[0154] In the following step T12 in FIG. 14, the condition detectingunit 13 is operated to read out the link information from theinformation storing unit 22 to divide the links into some groups inaccordance with the link information. The condition detecting unit 13divides links having the same link target into a same group. Then, thecondition detecting unit 13 further divides the links which is dividedin the same group, having the same link source description into a samesub-group. Then, the condition detecting unit 13 extract the links whichhas the different link source description. The condition detecting unit13 is further operated to give an criteria score to each link inaccordance with the number of links included in the sub-group.

[0155]FIG. 15B shows an example of the links extracted and the criteriascores given in the step T12. It can be understood from FIG. 15B thatthe links 211, 212, 213, and 215 are grouped as these links have a samelink target “document 116”, while the links 214, and 216 are grouped asthese links have a same link target “document 117”. The three links 211,212 and 213 in the group having the link target “document 116” arefurther sub-grouped as these links have a same link source description“GX0011”, while the link 215 is grouped into a sub-group having the linksource description GX0012”. The link 214 in the group having the linktarget “document 117” is grouped into a sub-group having the link sourcedescription “GX0011”, while the link 216 is grouped into a sub-grouphaving the link source description “GX0012”.

[0156] The method of giving the criteria score is the same as the stepT11. Thus, in the step T12, the criteria score of each of the links 211,212 and 213 becomes “1/12”, the criteria score of the link 215 becomes“3/4”, and the criteria score of each of the links 214 and 216 becomes“1/2 ”.

[0157] In the following step T13 in FIG. 14, the condition detectingunit 13 is operated to read out the link information from theinformation storing unit 22 to divide the links into some groups inaccordance with the link information. The condition detecting unit 13divides links having the same link source and link source descriptioninto a same group. Then, the condition detecting unit 13 further dividesthe links, which is divided in the same group, having the same linktarget into a same sub-groups. Then, the condition detecting unit 13extracts the links which has the different link target. The conditiondetecting unit 13 is further operated to give an criteria score to eachlink in accordance with the number of links included in the sub-group.

[0158]FIG. 15C shows an example of the links extracted and the criteriascores given in the step T13. It can be understood from FIG. 15C thatthe links 215 and 216 are grouped in a same group as these links have asame link source “document 115” and link source description “GX0012”.The link 215 is further grouped into a sub-group having the link target“document 116”, while the link 216 is grouped into a sub-group havingthe link target “document 117”.

[0159] The method of giving the criteria score is also the same as thestep T11. Thus, in the step T13, the criteria score of the links 215 and216 are “1/2”.

[0160] In the following step T14 in FIG. 14, the condition detectingunit 13 is operated to read the link information from the informationstoring unit 22 to extract the links the link source description ofwhich includes words that is not included in the title, the header orthe highlighted character string in the link target document thereof inaccordance with the link information. The condition detecting unit 13gives the criteria score “1” to each of the extracted links.

[0161]FIG. 15D shows an example of the links extracted and the criteriascores given in the step T14. It can be understood from FIG. 3 that asfor the links 214 and 215 shown in FIG. 15D, the words included in thelink source description are not expressed in the links target documents.

[0162] In the following step T15, the condition detecting unit 13 isoperated to sum up the criteria score of each of the links. Therefore,the criteria score of each of the links 211, 212, and 213 becomes “1/6”obtained by an equation “1/12+1/12=1/6”. The criteria score of the link214 becomes “9/4” obtained by an equation “3/4+1/2+1=9/4”. The criteriascore of the link 215 becomes “11/4” obtained by an equation“1/2+3/4+1/2+1=11/4”. The criteria score of the link 216 becomes “3/2”obtained by an equation “1/2+1/2+1/2=3/2”.

[0163] In the following step T16 in FIG. 14, the condition detectingunit 13 is operated to compare the sums of the criteria scores ofsub-groups with each other, and to then extract the links having thehigher criteria score as a mismatched link. The candidate providing unit12 is operated to provide the correction candidate for extracted linksunder each condition so as to harmonize link information about the linkhaving the higher score with that of the lower score in a same group.

[0164] As shown in FIG. 15A, in the group of the link source description“GX0011”, the sum of the criteria scores of the sub-group including thelinks 211, 212 and 213 becomes “1/2” obtained by an equation“1/6+1/6+1/6=1/2”, and the sum of the criteria scores of the sub-groupincluding the link 214 becomes “9/4”. Therefore, the link 214 which hasthe higher criteria score is decided as the mismatched link in thiscase. In order to harmonize the link information about the link 214 withthat of the sub-group including the links 211, 212 and 213, it can beunderstood that the correction candidate for the link 214 isappropriately obtained as “link target: document 116”.

[0165] Furthermore, in the group of the link source description “GX0012”in FIG. 15A, the sum of the criteria scores of the sub-group includingthe link 215 becomes “11/4”, and the sum of the criteria scores of thesub-group including the link 216 becomes “3/2”. Therefore, the link 215is decided as the mismatched link in this case. In order to harmonizethe link information about the link 215 with that of the sub-groupincluding the link 216, it can be understood that the correctioncandidate for the link 215 is appropriately obtained as “link target:document 117”. By the same token, in FIG. 15B, the link 215 is decidedas the mismatched link, and the correction candidate thereof is decidedas “link source description: “GX0012”. By the same token, in FIG. 15C,the link 215 is decided as the mismatched link, and the correctioncandidate thereof is decided as “link target: document 117”. It isunderstood from the above results that the mismatched links are thelinks 214 and 215, and the correction candidates of the links 214 and215 are “link target: document 116” OR “link source description:GX0012”, and “link target: document 117” OR “the link sourcedescription: GX0011”, respectively.

[0166] Although it is described in this embodiment that the link havingthe higher sum of the criteria score is decided as the mismatched link,it is not limited to that example. In another embodiment, there isprovided a method of deciding the mismatched link having the steps of:setting a predetermined threshold for the criteria score; and decidingthe link as the mismatched link only when the criteria score thereof ishigher than the threshold even if the criteria score thereof is higherthan those of others. The present invention is not limited to theembodiments as described above.

[0167] Furthermore, although it is described in this embodiment that thecriteria score is calculated, for example, based on the number of thelinks in each of the sub-groups, but it is not limited to that example.The criteria score may be simply the number of extractions. In anotherembodiment, there may be provided a method of calculating the criteriascore having the steps of: specifying a characteristic vector of thelink as the number of links in the sub-group; preparing a characteristicvector of the mismatched link as a teaching data; and calculating a meanof distance between the characteristic vector of the link and thecharacteristic vector of the mismatched link to obtain the criteriascore. The present invention is not limited to the embodiments describedabove.

[0168] Furthermore, although it is described in this embodiment that theextraction conditions of the error link are calculated by summing up thecriteria scores including: (1) a first criteria score calculated bycomparing the link source descriptions of the plural links for the samelink target page with each other; (2) a second criteria score calculatedby comparing the target pages of a plurality of links represented by thesame link source description with each other; (3) a third criteria scorecalculated by comparing the link target pages based on a plurality oflinks for the same link source page and the same link source descriptionwith each other; and (4) a fourth criteria score calculated by comparingthe link source description and the link target page in the contents,but it is not limited to that example. In another embodiment, thecriteria score may be calculated according to at least one of the abovecriteria scores, or according to the weighted criteria scores based oneach of conditions. The present invention is not limited to the aboveembodiments of the method.

[0169] An Embodiment of the Detection of the Expired Link

[0170] The operations of the condition detecting unit 13 and thecandidate providing unit 12 in the detection of the expired link will bedescribed in detail in the followings with reference to FIGS. 4 and 16of the drawings.

[0171] Firstly, the condition detecting unit 13 is operated to extractlinks including dated expressions in the link source descriptionthereof, or indicating documents including dated expressions. Then, thecondition detecting unit 13 is operated to calculate the expiration dateof the dated expression related to the extracted link, and to judgewhether the present date and time is prior to the expiration date or not(the step T21 in FIG. 16).

[0172] In the following step T22 in FIG. 16, the condition detectingunit 13 is operated to extract the expired expression from the linktarget document related to the extracted link. In this embodiment, theexpired expression means an expression more commonly used for a noticesentence when the service is terminated, closed, or moved, such as“Closed.”, “Moved”, “Ended.”, “Automatically jump after a few seconds.”,“effective in [date]”, “We appreciated your past patronage.”, “Weappreciated your past participation.”, and so on. Besides the aboveexpired expression, if the description in the HTML is indicated that thedocument can be automatically jumped after a few seconds, this isextracted as the expired expression.

[0173] In the following step T23 in FIG. 16, the condition detectingunit 13 is operated to calculate criteria score of the link byintegrating the result judged whether the present date and time isincluded in the expiration date or not in the step T21, and the numberof the expired expression extracted in the step T22. When this criteriascore is higher or equal to a predetermined threshold, the link havingthe criteria score is outputted as the mismatched link.

[0174] There may be provided an example of the method of calculating thecriteria score of the link including the step of multiplying the numberof dates obtained as the expired date and the number of appearances ofthe extracted expired expressions together. As for another embodiment,there may be provided a method of calculating the criteria scoreincluding the steps of: specifying a characteristic vector of the linkbased on the number of dates obtained as the expired date and the numberof appearances of the extracted expired expressions; calculating a meanvalue of distances between the specified characteristic vector of thelink and characteristic vectors of the mismatched link prepared asteaching data; and setting the mean value as the criteria score. Thepresent invention is not limited to the embodiments described above.

[0175] In the following step T24, the candidate providing unit 12 isoperated to extract the moved new address for the link outputted as themismatched link from the link target document to specify the new addressas the correction candidate. In this embodiment, the new address meansan address to which the document can be automatically jumped inaccordance with the HTML. Instead of the automatic jump of the document,the expression “Click here.”, or “Move to the following URL.” may beextracted. Then, the target address of a link included in the expressionor written in peripheral of the expression may be specified to be thecorrection candidate as the new address. When, on the other hand, thenew address cannot be extracted, the correction candidate may beoutputted as “link: delete”.

[0176] An example of the operations of the condition detecting unit 13and the candidate providing unit 12 will be described in the followingswith reference to FIG. 4A. Here, the method of calculating the criteriascore of the link including the step of multiplying the number of datesobtained as the expired date and the number of appearances of theextracted expired expressions together, as described above, is used.

[0177] Referring also to the step T21 of FIG. 16, as the document 125includes the dated expression such as “Jul. 20, 2002 to Aug. 31, 2002.”,the condition detecting unit 13 is operated to extract the links 211,222, 223, and 224. Assuming that the present date is Aug. 15, 2002, thecondition detecting unit 13 judges that the present date is prior to theexpiration date of the document 125, thereby judging the links 211, 222,223, and 224 are not expired.

[0178] In the next step T22 of FIG. 16, nothing is extracted, as thedocument 125 does not include expired expression.

[0179] With the result obtained in the step T21 that the present date isprior to the expiration date, and the result obtained in the step T22that no expressions expressing the expired date are extracted, both ofthe number of dates obtained as the expired date and the number ofappearance of the extracted expired expression are calculated to be “0”.Therefore, the criteria scores of the links 211, 222, 223, and 224become “0” obtained by an equation “0×0=0”. Therefore, it is judged thatall of the links 221, 222, 223, and 224 are appropriate or suitable inthe next step T23 of FIG. 16.

[0180] Another example of the operations of the condition detecting unit13 and the candidate providing unit 12 will be described in thefollowings with reference to FIG. 4B.

[0181] Referring also to the step T21 of FIG. 16, as the document 125includes the dated expression such as “Jul. 20, 2002 to Aug. 31, 2002.”,the condition detecting unit 13 is operated to extract the link 224.Assuming that the present date is Sep. 15, 2002, the condition detectingunit 13 judges that the present date is over the is over the expirationdate of the document 125, thereby judging the link 244 is expired.

[0182] In the next step T22 of FIG. 16, the condition detecting unit 13is operated to extract the expired expression such as “Closed.”.

[0183] With the result obtained in the step T21 that the present date isover the expiration date, and the result obtained in the step T22 thatthe expired expression such as “Closed.” is extracted, the number ofdates obtained as the expired date is calculated to be “15”, and thenumber of appearance of the extracted expired expression is calculatedto be ‘1’. This leads to the fact that the criteria score of the link224 is “15” obtained by an equation “15×1=15”. Therefore, when thethreshold is set as “10”, it is judged that the link 224 is themismatched link.

[0184] In the next step T24 of FIG. 16, the candidate providing unit 12is operated to extract the new address. However, as the document 125,shown in FIG. 4B, does not include corresponding address, the candidateproviding unit 12 cannot obtain the new address. Therefore, thecandidate providing unit 12 outputs the “link: delete” as the correctioncandidate of the link 224.

[0185] Although it is described in this embodiment that the expired linkis detected by the dated expression and the expired expression, but isnot limited to this method. For example, the detecting method, similarto the detection of the error link as described above, includes thesteps of: grouping the links having a same link target pages; anddetecting sub-groups having the different link source description in thesame group. Furthermore, in another embodiment, the detecting method mayinclude the steps of: grouping the links having a same link sourcedescription; and detecting the sub-groups having the different linktarget in the same group.

[0186] An Embodiment of the Detection of the Disunity in the Link SourceDescriptions

[0187] The operations of the condition detecting unit 13 and thecandidate providing unit 12 for the detection of the disunity in thelink source descriptions will be described in detail in the followings,with reference to FIGS. 5, 17 and 18 of the drawings.

[0188] Firstly, the condition detecting unit 13 is operated to read outthe link information from the information storing unit 22 to divide thelinks into some groups in accordance with the link information. Thecondition detecting unit 13 divides links having the same link targetinto a same group. Then, the condition detecting unit 13 further dividesthe links which is divided in the same group, having the same linksource description into a same sub-group. Then, the condition detectingunit 13 extracts the links which has the different link sourcedescription. The condition detecting unit 13 is further operated to givean criteria score to each link in accordance with the number of linksincluded in the sub-group, in the step T31 in FIG. 17.

[0189]FIG. 18 shows an example of the link extracted and the criteriascore given in the step T31, when the relationship between documents isas shown in FIG. 5. It can be understood from the description of FIG. 18that the links 231, 232, 233, and 234 are grouped as these links have asame link target “document 135”. The three links 231, 232, and 233 arefurther grouped into a sub-group of the same link source description “GXSeries”, while the link 234 is grouped into a sub-group of the linksource description “gX Series”.

[0190] The method of giving the criteria score includes the steps of:setting the criteria score for each of the groups to “1”; setting thecriteria score for each of the sub-groups to a value which is obtainedby distributing the criteria score into the number in inverse proportionto the number of links in the sub-groups, and setting the criteria scorefor each of the links to a value which is obtained by dividing thecriteria score of each of the sub-groups equally into the number of thelinks in the sub-groups. Therefore, the criteria score of each of thelinks 231, 232, and 233, given in the step T31 of FIG. 17, becomes“1/12” while the criteria score of the link 234, also given in the stepT31 of FIG. 17, becomes “3/4”, as shown in FIG. 18.

[0191] The condition detecting unit 13 is then operated to compare thesums of the criteria scores with each other of sub-groups, and to thenextract the links having the higher criteria score as a mismatched link.In FIG. 18, the criteria score of the link 234 “3/4” is the higher thanthe sum of the criteria scores of the links 231, 232 and 233 “1/4”.Therefore, the link 234 is extracted as the mismatched link.

[0192] In the following step T32 in FIG. 17, the candidate providingunit 12 is operated to investigate whether the link source descriptionof the extracted links is registered in a glossary or not. In thisembodiment, the glossary means a table having expressions to be unifiedwith a key of fluctuation of description for a word. For example, a word“free software” means a software available without admission, and has aplurality of expression fluctuation of description, such as “free ware”,and “free soft”. When the administrator can unify these words into aword “free software”, the words “free ware”, and “free soft” are assumedto be the key, and the word “free software” is assumed to be a value.These words may be registered in the glossary.

[0193] When the link source description of the extracted link is alreadyregistered in the glossary, YES of the step T32 in FIG. 17, thecandidate providing unit 12 is operated to output the correctioncandidate as the unified expression corresponding to the key, in thestep T33 in FIG. 17. In order to fully absorb fluctuations ofdescriptions, fuzzy search may be performed when the key is searched. Inanother embodiment, the method of calculating the correction candidatemay include the steps of: conducting fuzzy search for the unifiedexpression without the words of the fluctuation of description; judgingwhether affinity level in character string is the higher or equal to athreshold or not; and assuming the correction candidate as the searchedunified expression when the judgment is made that the affinity level incharacter string is the higher or equal to the threshold.

[0194] When, on the other hand, the link source description of theextracted link is not registered in the glossary, NO of the step T32 inFIG. 17, the candidate providing unit 12 is operated to provide thecorrection candidate to harmonize the link source description having thehigher criteria score with that of the lower criteria score in the samegroup, in the step T34 in FIG. 17. In the case shown FIG. 18, thecandidate providing unit 12 outputs “link source description: GX Series”as the correction candidate.

[0195] It is assumed that both of the words “GX Series”, and “gXSeries”, shown in FIG. 18, are not registered in the glossary.

[0196] Although it is described in this embodiment that the criteriascore is calculated, for example, based on the number of the links ineach of the sub-groups, the present invention is not limited to theembodiments described above. In another embodiment, there is provided amethod of calculating the criteria score having the steps of: specifyinga characteristic vector of the link based on the number of linksincluded in the sub-group; calculating a mean value of distances betweenthe specified characteristic vector of the link and characteristicvectors of the mismatched link prepared as teaching data; and settingthe mean value as the criteria score. The present invention is notlimited to the embodiments described above.

[0197] An Embodiment of the Detection of the Disunity in the Styles ofthe Link Source Descriptions

[0198] The operations of the condition detecting unit 13 and thecandidate providing unit 12 for the detection of the disunity in thestyle of the link source description will be described in detail in thefollowings, with reference to FIGS. 6, 19 and 20 of the drawings.

[0199] Firstly, the condition detecting unit 13 is operated to read thelink information from the information storing unit 22 to divide thelinks into some groups in accordance with the link information. Thecondition detecting unit 13 divides links having the same link sourcedocument into a same group. Then, the condition detecting unit 13further divide the links which is divided in the same group, having thesame target attribute into a same sub-group. Then, the conditiondetecting unit 13 extracts the links which has the different targetattribute. The condition detecting unit 13 is further operated to givean criteria score to each link in accordance with the number of linksincluded in the sub-group, in the step T41 in FIG. 19.

[0200]FIG. 20 shows an example of the links extracted and the criteriascores given in the step T41 in case where the relation between thedocuments is as shown in FIG. 6. It can be understood from FIG. 20 thatthe links 241, 242, 243, and 244 are grouped as these links have a samelink source “document 141”. The three links 241, 242, and 243 arefurther grouped into a sub-group of the same target attribute “_blank”,while the link 244 is grouped into a sub-group of the target attribute“not specified”.

[0201] The method of giving the criteria score includes the steps of:setting the criteria score for one of the groups to “1”; setting thecriteria score for each of the sub-groups to a value which is obtainedby distributing the criteria score into the number in inverse proportionto the number of links in the sub-groups, and setting the criteria scorefor each of the links to a value which is obtained by dividing thecriteria score of each of the sub-groups equally into the number of thelinks in the sub-groups. Therefore, as shown in FIG. 20, in the stepT41, the criteria score of each of the links 241, 242, and 243 becomes“1/12”, while criteria score of the link 244 becomes “3/4”.

[0202] The condition detecting unit 13 is then operated to compare thesums of the criteria scores with each other of sub-groups, and to thenextract the links having the higher criteria score as a mismatched link.In FIG. 20 the criteria score of the link 244 “3/4” is the higher thanthe sum of the criteria scores of the links 241, 242 and 243 “1/4”.Therefore, the link 244 is extracted as the mismatched link.

[0203] In the following step T42 in FIG. 19, the candidate providingunit 12 is operated to provide the correction candidate to harmonize thetarget attribute having the higher criteria score with that of the lowercriteria score in the same group. In the case shown in FIG. 20, thecandidate providing unit 12 outputs “target attribute: _blank” as thecorrection candidate.

[0204] Although it is described in this embodiment that the targets tobe grouped in the step T41 of FIG. 19 are the links having the same linksource document, but the present invention is not limited to thisembodiment. In another embodiment, there may be provided a methodincluding the step of grouping the links having a same link sourcedescription and included in a particular area, such as a table, and alist of links into a same group. In another embodiment, there may beprovided a method including the steps of: grouping the links among aplurality of documents, such as a particular document and the documentstored in a same directory as the particular document, based on thestyle; and detecting the disunity in the link style of the pageperipheral to the particular document.

[0205] In this embodiment, the method of detecting the disunity in thetarget attribute and calculating the correction candidate have beendescribed above, the similar method of detecting disunity in styleattributes and calculating the correction candidate may be provided.

[0206] In this embodiment, the criteria score is calculated, forexample, based on the number of the links in each of the sub-groups. Thepresent invention is not limited to this embodiment. In anotherembodiment, there is provided a method of calculating the criteria scorehaving the steps of: specifying a characteristic vector of the link asthe number of links in the sub-group; preparing a characteristic vectorof the mismatched link as a teaching data; and calculating a mean ofdistance between the characteristic vector of the link and thecharacteristic vector of the mismatched link to obtain the criteriascore.

[0207] An Embodiment of the Detection of the Phantom Link

[0208] The operations of the condition detecting unit 13 and thecandidate providing unit 12 in the detection of the phantom link will bedescribed in detail in the followings with reference to FIGS. 7 and 21of the drawings.

[0209] Firstly, the condition detecting unit 13 is operated to read outthe link information from the information storing unit 22, according tothe link information, to extract the link having an invisible linksource description, in the step T51 in FIG. 21. In this embodiment, theinvisible link source description means a null character string, atransparent image, a considerable infinitesimally small image orcharacter, or an image or character which is the same color as that of abackground. In FIG. 7A, the link having a link source descriptionspecifying a null character string is extracted.

[0210] In the following step T52 in FIG. 21, the candidate providingunit 12 is operated to output the correction candidate so as to deletethe link as “link: delete”.

[0211] An Embodiment of the Detection of the Loop Link

[0212] The operations of the condition detecting unit 13 and thecandidate providing unit 12 for the detection of the loop link or loopedlink will be described in detail in the followings, with reference toFIGS. 8 and 22 of the drawings.

[0213] Firstly, the condition detecting unit 13 is operated to read outthe link information from the information storing unit 22, to separatethe link source description of the link read from the informationstoring unit 22 into words, in the step T61 in FIG. 22. The method ofseparating the link source description into words may be performed byconducting a morphological analysis, separating the link sourcedescription at the change of sorts of characters, or separating the linksource description at every several letters.

[0214] In the following step T62 in FIG. 22, the condition detectingunit 13 is operated to extract a group of links forming a loop andidentical in the words in the link source description corresponding tothe loop link. In FIG. 8, all of the links 261, 262 and 263 including aword “present” form a loop, and therefore are assumed to be a loop linkto be outputted.

[0215] Although it is described in this embodiment the method ofextracting the loop links in which all of the link source descriptionincludes the same word, the present invention is not limited to thisembodiment. In another embodiment, there may be provided a methodincluding the steps of: preparing a dictionary including characteristicwords classified under each of the specific topics; and extracting theloop links by judging whether each of the link source descriptionsincludes the characteristic words classified for the same topic. Thepresent invention is not limited to the embodiments described above.

[0216] A Method of Detecting Mismatched Link Focused on a Change withTime

[0217] Although it is described in this embodiment the method ofdetecting some kinds of the mismatched links based on the linkinformation of each of the links collected at a same time, the presentinvention is not limited to this embodiment. In another embodiment,there may be provided the method of detecting all kinds of mismatchedlinks including the steps of: repeating the collection of the linkinformation periodically; and detecting all kinds of mismatched links byfocusing on a change in the link information in accordance with time.The operations of the condition detecting unit 13 and the candidateproviding unit 12 in method of detecting mismatched link focused on achange in accordance with time will be described in the followings withreference to FIGS. 1, 4, 23 and 24 of the drawings.

[0218] The information storing unit 22, shown in FIG. 1, is adapted tostore therein the link information at times T and T′.

[0219] Firstly, referring to T71 in FIG. 23, the condition detectingunit 13 is operated to group the links which are the same in at leastone item of the link information at times T and T′. FIG. 24 shows anexample of the links grouped into a group of the link target “document125” in accordance with the link information at times on Aug. 15, 2002,and on Sep. 15, 2002, when the relationship of the documents are asshown in FIG. 4.

[0220] In the following step T72 in FIG. 23, the link having many linksvaried in the link information is extracted from the same group as themismatched link. In case of FIG. 24, there are four links of the linktarget “document 125” at a time on Aug. 15, 2002, but there is only onelink of the link target “document 125” at a time on Sep. 15, 2002.Therefore, the link 224 is extracted as the mismatched link.

[0221] In the following step T72 in FIG. 23, the candidate providingunit 12 is operated to provide the correction candidate to compensatethe change caused between the times T and T′. Referring to FIG. 23,because the deletion of the links are caused to the rest of the links221, 222, and 223, between on Aug. 15, 2002 and on Sep. 15, 2002,therefore, the candidate providing unit 12 provides link: deletes as thecorrection candidate.

[0222] As described above, in this embodiment, the links having the samelink target document at times T and T′ are respectively grouped as asame group, and when there is a change among some of the links includedin the same group between the times T and T′, the rest of the link(s) inthe group is (are) extracted as the mismatched link. Although it isdescribed in this embodiment that the change is that some of the linksare deleted, it is not limited to that example. For example, when thereis a change in the link target document for some of the links, thecandidate providing unit 12 may provide a correction candidate thatindicates the user to correct the link source description.

[0223] Although it is described in this embodiment that the links havingthe same link target document at times T and T′ are respectively groupedas a same group, the present invention is not limited to thisembodiment. In another embodiment, there may be provided a methodincluding the steps of: grouping links having a same link sourcedescription as a same group; and detecting a change in the style ortarget attribute.

[0224] The effect of this embodiment will be described in thefollowings.

[0225] In this embodiment, all kinds of logical mismatches can bedetected. More specifically, in this embodiment, a kind of thedetectable logical mismatches may include: (1) putting a link to a wrongdestination or target; (2) putting a link to the expired information;(3) disunity in the link source descriptions; and (4) disunity in thestyles of the link source descriptions, as the mismatched link detectingmethod includes the steps of: extracting the link information from thehypertext database; grouping the links of each item of the linkinformation; and detecting the particular link excluded from the groupto consider it as a mismatched link. The logically mismatches, such as(2) the link for the expired information, may be detected by repeatingthe collection of the link information periodically, and focusing on achange in the link information in accordance with time.

[0226] Furthermore, (5) the phantom link for one example of thelogically mismatches may be detected by detecting the link having nolink source description, and (6) the loop link for another example ofthe logically mismatches may be detected by detecting the links includedin a group of links forming a loop and having the link sourcedescriptions corresponding the group of links relevant to a topic.

[0227] In this embodiment, the correction candidate of the logicallymismatch can be provided for the administrator. More specifically, thecandidate correcting method may include a process of automaticallycalculating the correction candidate so as to harmonize the linkinformation of the particular link excluded from the group with the linkinformation of the rest of the links in the group. Therefore, it isunnecessary for the administrator to consider how to correct themismatched links, and further it is possible to automatically reflectingthe correction.

[0228] Furthermore, the grouped mismatched links can be collectivelydisplayed on a display screen in this embodiment. Therefore, all theadministrator has to do is to confirm a part of links, thereby making itpossible to judge whether the remaining links are mismatched or not.Therefore, the efficiency of check by the administrator can beconsiderably enhanced.

[0229] In this embodiment, there may be provided a display screendisplayed thereon a list sorted by each of three items including:

[0230] (1) a link source description; (2) identification informationabout a link source page; and (3) identification information about alink target page. Therefore, the administrator can grasp the correctionitem every pages, intensively examine a mismatch to a key page, andexamine suitability of the expression which is used for the link sourcedescription.

[0231] In this embodiment, the data processing unit 1 includes theinformation collecting unit 11, but this information collecting unit 11may be omitted from the data processing unit 1, as the collection andstorage of information about a page and link from the hypertext database21 which is performed by the information collecting unit 11 in thisembodiment, may be performed by another data processing unit, not shown.

[0232] Furthermore, the correction reflecting unit 14 in this embodimentmay be omitted from the data processing unit 1, when the administratorcan correct the mismatched parts in the hypertext database 21 by hishand while viewing a display screen of a list of the results shown inFIG. 13. Even if there are no information about a kind of mismatchedlink or the correction candidate, the administrator can derive acorrection candidate from information, except the kind of mismatchedlink or the correction candidate, as shown on the display screen in FIG.13. Therefore, the candidate providing unit 12 in this embodiment may beomitted from the data processing unit 1.

[0233] Second Preferred Embodiment

[0234] Referring now to FIG. 25 of the drawings, there is shown a secondpreferred embodiment of the hypertext checking apparatus according tothe present invention.

[0235] As shown in FIG. 25, the data processing unit 5 includes: thesame constitutional elements as those of the data processing unit 1shown in FIG. 1 in the first embodiment. In addition, the dataprocessing unit 5 of this embodiment includes an importance calculatingunit 15.

[0236] The importance calculating unit 15 is adapted to calculate animportance value for the mismatched link extracted by the conditiondetecting unit 13 in accordance with an access frequency to the documentin the detected mismatched link, or a seriousness of mismatched link,and to output the calculated importance value with ranks.

[0237] The operation of the data processing unit 5 in this embodimentwill be described in the followings with reference to the drawings.

[0238] The operations of the information collecting unit 11 and thecondition detecting unit 13 of this embodiment, shown in the steps S1 toS3 in FIG. 26, are same as those of the information collecting unit 11and the condition detecting unit 13 of the first embodiment shown inFIG. 10, thereby the description to these steps is omitted. Then, in thestep S4, the candidate providing unit 12 is operated to provide acorrection candidate so as to eliminate the mismatch in the linkextracted by the condition detecting unit 13 as the mismatched link,which is the same as the step S4 of the first embodiment shown in FIG.10. Then, instead of the step S5 of the first embodiment shown in FIG.10, a control is passed to the importance calculating unit 15 for havingthe importance calculating unit 15 calculate the importance value forthe mismatched link, shown as step S8 in FIG. 26.

[0239] The importance calculating unit 15 is operated to calculate theimportance value of the link extracted as the mismatched link by thecondition detecting unit 13, and to output the calculated importancevalue as a ranking list, shown as the steps S8 and S9 in FIG. 26. Inthis embodiment, the importance value may be calculated based on atleast a factor or a combination of a plurality of factors including: (1)a sort of errors and unsuitability of the detected parts; (2) accuracyof errors and unsuitability of the detected parts; (3) the number oftargeted links of the page including the detected parts; (4) record forfrequency of access by user to the page including the detected parts;and (5) a stratification level in the hypertext of the page includingthe detected parts.

[0240] Referring to FIG. 27 of the drawings, there is shown a displayscreen including the ranking list of the outputted mismatched link. Theranking list of the display screen shown in FIG. 27 includes “importancevalue” in addition to the “kinds of mismatch” and the “correctioncandidate” and so on which are also included in the list in FIG. 13.More specifically, this importance value of the mismatched link isobtained by grouping the links having the same link targets and the samelink source descriptions as a same group, and calculating the importancevalue of the mismatched links for each of the groups, in addition to thekinds of mismatch and the correction candidate. The importance value ofthe mismatched link thus obtained is listed in the order where the grouphaving the higher importance value is listed above. The administrator iscapable of performing the step S6 in FIG. 26, in which the confirmationand re-writing of the correction candidate is conducted, with referringto the ranking list. As the ranking list includes the importance valuewhich is listed in the order as described above, the administrator iseasily conduct the step S6 in FIG. 26.

[0241] After that, in the following step S7 in FIG. 26, the correctionreflecting unit 14 reflects the correction for each of the documents inthe hypertext database 21 in accordance with the confirmed or correctedcorrection candidate. This step is similarly conducted as the firstembodiment.

[0242] Although it is described in this embodiment that the importancecalculating unit 15 is operated to calculate the importance value of themismatched link and to output the calculated importance value as aranking list after the candidate providing unit 12 is operated toprovide the correction candidate, the present invention is not limitedto this embodiment. The order of processes is arbitrary changed. Forexample, in another embodiment, the importance calculating unit 15 maybe operated to calculate the importance value of the mismatched link andto output the calculated importance value as a ranking list before thecandidate providing unit 12 is operated to provide the correctioncandidate.

[0243] Although it is described in this embodiment that theadministrator performs the confirmation of the outputted mismatched linkand correction candidate, in the step S6 in FIG. 26, the presentinvention is not limited to this embodiment. In another embodiment, thestep 6 may be omitted and the steps S1 through S7 may be automaticallyperformed.

[0244] Although it is described in this embodiment that theadministrator decides a timing of confirmation, the present invention isnot limited to this embodiment. For example, in another embodiment, thecollection conditions and the extraction conditions may be previouslydetermined, and the steps S1 to S4, S8, and S9 may be automaticallyperiodically performed. In this case, the results may be informed to theadministrator by an electronic mail or the like.

[0245] The collection and storage of information about a page and a linkfrom the hypertext database 21 which is performed by the informationcollecting unit 11 shown in FIG. 25 in this embodiment, may be performedby another data processing unit, which is not shown in the drawings. Insuch the case, the data processing unit 5 shown in FIG. 25 of thisembodiment does not need to include the information collecting unit 11.Furthermore, the administrator can correct the mismatched parts in thehypertext database 21 by his/her hand while viewing a display screen ofa list of the results shown in FIG. 27. In such the case, the dataprocessing unit 5 shown in FIG. 25 of this embodiment does not need toinclude the correction reflecting unit 14.

[0246] Furthermore, the administrator can select a correction candidateby himself/herself with the help of information shown in the list of thedisplay screen in FIG. 27 even if the list does not include a kind ofmismatched link and the correction candidate. In such the case, the dataprocessing unit 5 shown in FIG. 25 of this embodiment does not need toinclude the candidate providing unit 12

[0247] Third Preferred Embodiment

[0248] Referring now to FIG. 28 of the drawings, there is shown a thirdpreferred embodiment of the hypertext checking apparatus according tothe present invention.

[0249] As shown in FIG. 28, the data processing unit 6 of the thirdembodiment includes: the same constitutional elements as those of thedata processing unit 5 shown in FIG. 25 in the second embodiment. Thedata processing unit 6 of this embodiment is different from the dataprocessing unit 5 shown in FIG. 25 in including a total scorecalculating unit 16 instead of the correction reflecting unit 14.

[0250] The total score calculating unit 16 is adapted to calculate atotal score of the targeted site based on the mismatched link detectedby the condition detecting unit 13 and the importance value of themismatched link calculated by the importance calculating unit 15. Inthis embodiment, the total score may be calculated based on the numberof the mismatched links or a ratio of the number of mismatched links tothe total number of links, as well, in addition to using the sum of thevalue of the mismatched link calculated by the importance calculatingunit 15.

[0251] The operation of the hypertext checking apparatus according tothe present invention will be described in the followings with referenceto the drawings.

[0252] The operations of the information collecting unit 11, thecandidate providing unit 12, the condition detecting unit 13, and theimportance calculating unit 15 of this embodiment, shown in the steps S1to S4, and S8 in FIG. 29, are same as those of the second embodimentshown in FIG. 26, thereby the description to these steps is omitted.

[0253] In the above second embodiment, the correction is reflected tothe hypertext database 21 in accordance with the correction candidate,after detecting the mismatched link. As shown in the step S10 in FIG.29, the total score calculating unit 16 is operated to calculate thetotal score of the targeted site based on the importance valuecalculated by the importance calculating unit 15 after the mismatchedlink is detected in the step S3. Then, the total score calculating unit16 outputs the calculated total score.

[0254] The total score calculating unit 16 may periodically perform thiscalculation. The total score calculating unit 16 may then output thecalculated total score. FIG. 30 shows the outputted results of the totalscore in accordance with times. With these results, it is possible tosee progress of improvement in quality of the targeted site. Referringto FIG. 30, as the time goes on, a rise in total score becomessaturated. It is understood from this result that the process forimproving the quality of the targeted site comes to an end.

[0255] In this embodiment, the total score calculating unit 16 maycalculate the total score at regular intervals, and an alert may beinformed when a predetermined condition is fulfilled, such that thetotal score or the importance value of the parts detected as themismatched link exceeds a predetermined threshold. With this function,the administrator can receive the alert when the quality of sitedeclines.

[0256] The total score calculating unit 16 may calculate the total scoreof each of a plurality of different sites “A” to “M”. FIG. 31 shows anexample of the results outputted by the total score calculating unit 16.Here, the result is listed in descending order in level. With thisresult, the administrator is capable of comparing quantitativelyqualities of the sites with each other. It is seen from FIG. 31 that thequality of the site “A” is twice as excellent as that of the site “E”,for example.

[0257] The effect of this embodiment will be described in thefollowings.

[0258] In this embodiment, the total score of the quality of thetargeted site is calculated based on the number of the detectedmismatched links and the importance value. For this reason, it ispossible to grasp progress of improvement in quality of site, andcompare quantitatively qualities of the different sites with each other.

[0259] Although the data processing unit 6 of this embodiment includesthe information collecting unit 11, the information collecting unit 11may be omitted from the data processing unit 6, because of the fact thatthe collection and storage of information about a page and link from thehypertext database 21 which is performed by the information collectingunit 11 in this embodiment, may be performed by another data processingunit, not shown.

[0260] Although it is not mentioned, the reflection or correction of thedetected mismatched parts in the hypertext database 21 may be performedupon request. When the reflection is performed, the administrator maycorrect the mismatched parts in the hypertext database 21 by his/herhand while viewing a display screen of a list of the results shown inFIG. 27. Alternatively, there may be provided the correction reflectingunit 14 similar to that of the second embodiment.

[0261] Even if there are no information about a kind of mismatched linkor the correction candidate, the administrator can derive a correctioncandidate from information, except the kind of mismatched link or thecorrection candidate, as shown on the display screen in FIG. 27.Therefore, the candidate providing unit 12 in this embodiment may beomitted from the data processing unit 1.

[0262] Fourth Preferred Embodiment

[0263] The fourth preferred embodiment of the hypertext checkingcomputer program product according to the present invention will bedescribed in the followings with reference to the drawings.

[0264] The fourth preferred embodiment of the hypertext checking programproduct includes a computer usable storage medium, not shown in thedrawings, such as a CD-ROM, DVD-ROM, MO, hard disk, EPROM, EEPROM, andso on, or downloaded from a Network server, such as Internet, havingcomputer readable code embodied therein for checking a hypertext.

[0265] Referring now to FIG. 32 of the drawings, there is shown oneexample of a system including an input unit 501, a data processing unit502, an output device 503, and a storage device 504 which are similar tothe constitutional elements of the apparatus of the first preferredembodiment. This system further includes a hypertext checking program500 for carrying out a function of the fourth preferred embodiment ofthe hypertext checking program product according to the presentinvention which is similar to that of the first embodiment of thehypertext checking apparatus.

[0266] The input unit 501 is adapted to allow an operator to input aninstruction therethrough. The input unit 501 is such as a mouse, akeyboard, and so on. The output device 503 is adapted to output aprocessing result from the data processing unit 502. The output device503 is, for example, a display screen of a displaying unit, a printer,and so forth.

[0267] The hypertext checking program 500 is read out from the computerusable storage medium to the data processing unit 502. The hypertextprogram 500 is then executed by the data processing unit 502 to controlthe operation of the data processing unit 502, and to create an inputmemory 505 and a working memory 506 in the storage device 504. Thehypertext checking program 500 can therefore establish, as the dataprocessing unit 502, functions of the information collecting unit 11,the candidate providing unit 12, the condition detecting unit 13 and thecorrection reflecting unit 14 in the first embodiment of the hypertextchecking apparatus shown in FIG. 1. The data processing unit 502 thusconstructed can perform the steps which are the same as those of thefirst embodiment by executing the hypertext checking program 500.

[0268] The data processing unit 502 and the storage device 504 shown inFIG. 32 correspond to the data processing unit 1 and the storage device2 shown in FIG. 1, respectively. In this embodiment, the data processingunit 502 may be operated to access an external database by way of anetwork, such as Internet, in addition to the hypertext database 21which is stored in the storage device 2 and a target for the check shownin FIG. 1.

[0269] Fifth Preferred Embodiment

[0270] The fifth preferred embodiment of the hypertext checking computerprogram product according to the present invention will be described inthe followings with reference to the drawings.

[0271] The configuration of the fifth embodiment is shown in FIG. 32which is the same figure of the above fourth embodiment. The fifthpreferred embodiment of the hypertext checking program product includesa computer usable storage medium, not shown, having computer readablecode embodied therein for checking a hypertext.

[0272] The hypertext checking program 500 is read out from the computerusable storage medium to the data processing unit 502. The hypertextprogram 500 is then executed by the data processing unit 502 to controlthe operation of the data processing unit 502, and to create an inputmemory 505 and a working memory (or working area) 506 in the storagedevice 504. The hypertext checking program 500 can therefore establish,as the data processing unit 502, functions of the information collectingunit 11, the candidate providing unit 12, the condition detecting unit13, the correction reflecting unit 14 and the importance calculatingunit 15 in the second embodiment of the hypertext checking apparatusshown in FIG. 25. The data processing unit 502 thus constructed canperform the steps which are the same as those of the second embodimentby executing the hypertext checking program 500.

[0273] The data processing unit 502 and the storage device 504 shown inFIG. 32 correspond to the data processing unit 5 and the storage device2 shown in FIG. 25, respectively. In this embodiment, the dataprocessing unit 502 may be operated to access an external database byway of a network, such as Internet, in addition to the hypertextdatabase 21 which is stored in the storage device 2 and a target for thecheck shown in FIG. 1.

[0274] Sixth Preferred Embodiment

[0275] The sixth preferred embodiment of the hypertext checking computerprogram product according to the present invention will be described inthe followings with reference to the drawings.

[0276] The configuration of the sixth embodiment is shown in FIG. 32which is the same figure of the above fourth embodiment. The sixthpreferred embodiment of the hypertext checking program product includesa computer usable storage medium, not shown, having computer readablecode embodied therein for checking a hypertext.

[0277] The hypertext checking program 500 is read out from the computerusable storage medium to the data processing unit 502. The hypertextprogram 500 is then executed by the data processing unit 502 to controlthe operation of the data processing unit 502, and to create an inputmemory (or input buffer) 505 and a working memory 506 in the storagedevice 504. The hypertext checking program 500 can therefore establish,as the data processing unit 502, functions of the information collectingunit 11, the candidate providing unit 12, the condition detecting unit13, the importance calculating unit 15 and the total score calculatingunit 16 in the second embodiment of the hypertext checking apparatusshown in FIG. 28. The data processing unit 502 thus constructed canperform the steps which are the same as those of the third embodiment byexecuting the hypertext checking program 500.

[0278] The data processing unit 502 and the storage device 504 shown inFIG. 32 correspond to the data processing unit 6 and the storage device2 shown in FIG. 28, respectively. In this embodiment, the dataprocessing unit 502 may be operated to access an external database byway of a network, such as Internet, in addition to the hypertextdatabase 21 which is stored in the storage device 2 and a target for thecheck shown in FIG. 1.

[0279] As described above, the following effect can be achievedaccording to the embodiments of the present invention.

[0280] The present invention has a first advantage over the prior art inmaking it possible to detect all kinds of logically mismatches. It isunderstood from the following description why the present invention hasthe first advantage. According to the present invention, a kind of thedetectable logically mismatches include: (1) putting a link to a wrongdestination; (2) a link for the expired information; (3) disunity in thelink source descriptions; and (4) disunity in the styles of the linksource descriptions, as the mismatched link detecting method includesthe steps of: extracting the link information from the hypertextdatabase; grouping the links of each item of the link information; anddetecting the particular link excluded from the group to consider thedetected particular link to be a mismatched link. The logicallymismatches, such as (2) the link for the expired information, can bedetected by repeating the collection of the link informationperiodically, and focusing on a change in the link information inaccordance with time.

[0281] Furthermore, (5) the phantom link for one example of thelogically mismatches can be detected by detecting the link having nolink source description, and (6) the loop link for another example ofthe logically mismatches can be detected by detecting the links includedin a group of links forming a loop and having the link sourcedescriptions corresponding the group of links relevant to a same topic.

[0282] The present invention has a second advantage over the prior artin that the correcting method of the mismatched links can beautomatically determined, thereby making it unnecessary for theadministrator to consider how to correct the mismatched links. As thecandidate correcting method includes a process of automaticallycalculating the correction so as to harmonize the link information ofthe particular link with the link information of the other links in thegroup, the above advantage can be obtained.

[0283] The present invention has a third advantage over the prior art inthat the efficiency of check by the administrator can be considerablyenhanced. As the grouped mismatched links can be collectively displayedon a display screen, all the administrator has to do is to confirm apart of links, thereby making it possible to judge whether the remaininglinks are mismatched or not.

[0284] The present invention has a fourth advantage over the prior artin making it possible to grasp the correction item every pages,intensively examine a mismatch against a key page, and examinesuitability of the expression which is used for the link sourcedescription. As there may be provided a display screen displayed thereona list having three items including: (1) a link source description; (2)identification information about a link source page; and (3)identification information about a link target page, the above advantagecan be obtained.

[0285] The present invention has a fifth advantage over the prior art inmaking it possible to grasp progress of improvement in quality of site,and compare quantitatively qualities of the different sites with eachother. As the total score of the quality of the targeted site iscalculated based on the number of the detected mismatched links and theimportance, the above advantage can be obtained.

What is claimed is:
 1. An apparatus for checking a hypertext, targetinga hypertext database, capable of detecting a part including a logicallymismatched link in said hypertext database.
 2. The apparatus forchecking a hypertext as set forth in claim 1 is operated to detect atleast one of the following parts as said part, said parts including: apart having a mismatch between a link source description and contents ofa link target page, said link target page being linked with said linksource description; a part having a mismatch between a link sourcedescription and contents of a link target page, the contents of saidlink target page being changed, said link target page being linked withsaid link source description; a part having a disunity among a pluralityof link source descriptions having a same link target page; a parthaving a disunity in styles among a plurality of link sourcedescriptions included in a same page or peripheral pages; a part havingno link source description; and a part including a group of linksforming a loop, the link source descriptions of said links relating to asame topic.
 3. An apparatus for checking a hypertext comprising: aninformation storing unit which stores an information about links relatedto said hypertext; and a condition detecting unit which refers to saidinformation storing unit to detect a part including a logicallymismatched link.
 4. The apparatus for checking a hypertext as set forthin claim 3, further comprising an information collecting unit whichcollects said information about the links related to said hypertext,wherein said information storing unit stores said information about thelinks collected by said information collecting unit.
 5. The apparatusfor checking a hypertext as set forth in claim 3, further comprising acandidate providing unit which provides a correction candidate relatedto said part including the logically mismatched link detected by saidcondition detecting unit.
 6. The apparatus for checking a hypertext asset forth in claim 5, further comprising an importance calculating unitwhich calculates importance value of said part including the logicallymismatched link detected by said condition detecting unit.
 7. Theapparatus for checking a hypertext as set forth in claim 5, furthercomprising a correction reflecting unit which corrects said hypertextbased on said part including the logically mismatched link detected bysaid condition detecting unit and said correction candidate provided bysaid correction providing unit.
 8. The apparatus for checking ahypertext as set forth in claim 6, further comprising a total scorecalculating unit which calculates a total score related to saidhypertext based on at least one of factors including: the importancevalue calculated by said importance calculating unit, the number of saidparts detected by said condition detecting unit, and the rate of thenumber of said part detected by said condition detecting unitcorresponding to the total number of the links.
 9. The apparatus forchecking a hypertext as set forth in claim 3, further comprising animportance calculating unit which calculates the importance value of thepart including the logically mismatched link detected by said conditiondetecting unit.
 10. The apparatus for checking a hypertext as set forthin claim 9 further comprising a total score calculating unit whichcalculates a total score related to said hypertext based on at least oneof factors including: the importance value calculated by said importancecalculating unit, the number of said parts detected by said conditiondetecting unit, and the rate of the number of said part detected by saidcondition detecting unit corresponding to the total number of the links.11. The apparatus for checking a hypertext as set forth in claim 3,wherein said condition detecting unit is operated to divide saidinformation about the links into some groups in accordance with apredetermined condition and detects a minor group as said part includingthe logically mismatched link.
 12. The apparatus for checking ahypertext as set forth in claim 3, wherein said condition detecting unitis operated to detect a part including a link of which a link sourcedescription and contents of a link target page are mismatched as saidpart including the logically mismatched link.
 13. The apparatus forchecking a hypertext as set forth in claim 3, wherein said conditiondetecting unit is operated to calculate criteria scores of the linksbased on at least one of the following scores and detects a link with ahigh criteria scores as said part, said scores including: (1) a firstscore calculated by comparing link source descriptions of a plurality oflinks having a same link target page with each other; (2) a second scorecalculated by comparing link target pages of a plurality of links havinga same link source description with each other; (3) a third scorecalculated by comparing link target pages of a plurality of links havinga same link target page and a same link source description with eachother; and (4) a fourth score calculated by comparing contents of a linksource description and contents of a link target page, said link sourcedescription being linked with said link target page.
 14. The apparatusfor checking a hypertext as set forth in claim 3, wherein said conditiondetecting unit is operated to detect a part having a mismatch between alink source description and contents of a link target page, said linksource description being linked with said link target page, and saidmismatch being caused by changing the contents of said link target page.15. The apparatus for checking a hypertext as set forth in claim 3,wherein said condition detecting unit is operated to calculate criteriascores of the links based on at least one of the following scores and todetect a link with a high criteria scores as said part, said scoresincluding: (1) a first score calculated by comparing link sourcedescriptions of a plurality of links having a same link target page witheach other (2) a second score calculated by detecting a noticedescription including a movement notice description or an expirationnotice description included in the contents of a link target page; and(3) a third score calculated by detecting a description of period ofvalidity included in the contents of a link target page and comparingsaid period of validity and present date and time.
 16. The apparatus forchecking a hypertext as set forth in claim 3, wherein said conditiondetecting unit is operated to detect a part having a disunity among aplurality of link source descriptions having a same link target page.17. The apparatus for checking a hypertext as set forth in claim 3,wherein said condition detecting unit is operated to detect a parthaving a disunity in styles among a plurality of link sourcedescriptions included in a same page or peripheral pages.
 18. Theapparatus for checking a hypertext as set forth in claim 5, wherein saidcondition detecting unit is operated to divide said information aboutthe links into some groups including a major group and a minor group inaccordance with a predetermined condition and detects said minor groupas said part including the logically mismatched link.
 19. The apparatusfor checking a hypertext as set forth in claim 18, wherein saidcandidate providing unit is operated to provide a correction candidatethat makes said minor group same as said main group.
 20. The apparatusfor checking a hypertext as set forth in claim 5, wherein said conditiondetecting unit is operated to detect a part including a link of which alink source description and contents of a link target page aremismatched as said part including the logically mismatched link.
 21. Theapparatus for checking a hypertext as set forth in claim 5, wherein saidcondition detecting unit is operated to calculate criteria scores of thelinks based on at least one of the following scores and detects a linkwith a high criteria scores as said part, said scores including: (1) afirst score calculated by comparing link source descriptions of aplurality of links having a same link target page with each other; (2) asecond score calculated by comparing link target pages of a plurality oflinks having a same link source description with each other; (3) a thirdscore calculated by comparing link target pages of a plurality of linkshaving a same link target page and a same link source description witheach other; and (4) a fourth score calculated by comparing contents of alink source description and contents of a link target page, said linksource description being linked with said link target page.
 22. Theapparatus for checking a hypertext as set forth in claim 21, whereinsaid candidate providing unit is operated to provide at least one of thefollowing correction candidates including: (1) a first correctioncandidate for the link source description obtained by comparing the linksource descriptions of a plurality of links having a same link targetpage with each other; (2) a second correction for the link targetcandidate obtained by comparing target pages of a plurality of linkshaving a same link source description with each other; (3) a correctioncandidate for the link target obtained by comparing link target pages ofa plurality of links having a same link target page and a same linksource description with each other; and (4) a correction candidate forthe link source description obtained by comparing contents of a linksource description and contents of a link target page, said link sourcedescription being linked with said link target page.
 23. The apparatusfor checking a hypertext as set forth in claim 5, wherein said conditiondetecting unit is operated to detect a part having a mismatch between alink source description and contents of a link target page, said linksource description being linked with said link target page, and saidmismatch being caused by changing the contents of said link target page.24. The apparatus for checking a hypertext as set forth in claim 5,wherein said condition detecting unit is operated to calculate criteriascores of the links based on at least one of the following scores anddetects a link with a high criteria scores as said part, said scoresincluding: (1) a first score calculated by comparing link sourcedescriptions of a plurality of links having a same link target page witheach other; (2) a second score calculated by detecting a noticedescription including a movement notice description or an expirationnotice description included in the contents of the link target page; and(3) a third score calculated by detecting a description of period ofvalidity included in the contents of a link target page and comparingsaid period of validity and present date and time.
 25. The apparatus forchecking a hypertext as set forth in claim 24, wherein said candidateproviding unit is operated to provide at least one of the followingcorrection candidates including: (1) a first correction candidate forthe link source description obtained by comparing link sourcedescriptions of a plurality of links having a same link target page witheach other; and (2) a second correction candidate for the link targetobtained by extracting the description of new moved address from thecontents of a link target page.
 26. The apparatus for checking ahypertext as set forth in claim 5, wherein said condition detecting unitis operated to detect a part having a disunity among a plurality of linksource descriptions having a same link target page, and said candidateproviding unit provides a correction candidate for the link sourcedescription by comparing link source descriptions of a plurality oflinks having a same link target page as that of said part detected bysaid condition detecting unit.
 27. The apparatus for checking ahypertext as set forth in claim 5, wherein said condition detecting unitis operated to detect a part having a disunity in styles among aplurality of link source descriptions included in a same page orperipheral pages, and said candidate providing unit is operated toprovide said correction candidate for the style of the link sourcedescription by comparing the style of a plurality of link sourcedescriptions included in the detected part detected by said conditiondetecting unit.
 28. The apparatus for checking a hypertext as set forthin claim 4, wherein said information collecting unit is operated torepeatedly collect said information about the links in the hypertext,and said information storing unit stores a plurality of said informationabout the links collected at a plurality of different times.
 29. Theapparatus for checking a hypertext as set forth in claim 28, whereinsaid condition detecting unit is operated to detect a part having amismatch between a link source description and contents of a link targetpage by referring to said information storing unit and calculatingchanges of the numbers of the links or kinds of the link sourcedescription to the link target page during said times, the contents ofsaid link target page being changed.
 30. The apparatus for checking ahypertext as set forth in claim 3, wherein said condition detecting unitis operated to detect a link having no link source description as saidpart including the logically mismatched link.
 31. The apparatus forchecking a hypertext as set forth in claim 3, wherein said conditiondetecting unit is operated to detect a link having the link sourcedescription in which no character strings or images are included, or alink having the link source description in which a character string oran image expressed in an inconspicuous color or a size is included, assaid part including the logically mismatched link.
 32. The apparatus forchecking a hypertext as set forth in claim 3, wherein said conditiondetecting unit is operated to detect a group of links forming a loop assaid part, the link source descriptions of said links relating to a sametopic.
 33. The apparatus for checking a hypertext as set forth in claim6, wherein said importance calculating unit is operated to calculateimportance value based on at least one of the following factorsincluding: (1) a sort of errors or unsuitability of the detected partdetected by said condition detecting unit; (2) accuracy of errors orunsuitability of said detected part; (3) the number of links which isconnected to the page including said detected part; (4) a record offrequency of access to the page including said detected part; and (5) astratification level in the hypertext of the page including saiddetected part.
 34. The apparatus for checking a hypertext as set forthin claim 6, wherein said importance calculating unit is operated tocalculate the importance value of the detected part detected by saidcondition detecting unit, and to control output condition for saiddetected part in accordance with said importance value, said outputcondition including the number of outputting said detected part or amethod of outputting said detected part.
 35. The apparatus for checkinga hypertext as set forth in claim 4, wherein said information collectingunit is operated to extract character strings corresponding to the linksource description by character recognition when the link sourcedescription is an image, and to resister said extracted characterstrings as said information about links on said information storingunit.
 36. The apparatus for checking a hypertext as set forth in claim1, having a hypertext on a Web site to be checked target.
 37. Theapparatus for checking a hypertext as set forth in claim 3, having ahypertext on a Web site to be checked target.
 38. A method of checking ahypertext comprising the steps of: (a) accepting a condition fordetecting a part from a hypertext database, said part including a parthaving an error or a mismatch in a link source description or arelationship between links; (b) detecting said part based on saidcondition; (c) displaying, on a display screen, a result of thedetection as a list with three items including: (1) a link sourcedescription; (2) identification information about a link source page;and (3) identification information about a link target page.
 39. Themethod of checking a hypertext as set forth in claim 38, wherein saidlist is sorted by having one of said three items as a key in said step(c).
 40. The method of checking a hypertext as set forth in claim 38further comprising the steps of: (d) accepting a correction candidatefor said three items; and (e) correcting said hyper text database inaccordance with said correction candidate accepted in said step (d). 41.The method of checking a hypertext as set forth in claim 38, furthercomprising the step of specifying a hypertext database to be checked.42. A method of checking a hypertext comprising the steps of: (a)collecting information about links in a Web site; (b) detecting a partincluding a logically mismatched link by referring to said informationcollected in said step (a); (c) calculating importance value of saidpart detected in said step (b); (d) calculating a total score related tosaid Web site; (e) performing periodically said steps (a) to (d) forsaid Web site; and (f) notifying of a change of said total score relatedto said Web site in accordance with time.
 43. A method of checking ahypertext comprising the steps of: (a) collecting information aboutlinks in a Web site; (b) detecting a part including a logicallymismatched link by referring to said information collected in said step(a); (c) calculating importance value of said part detected in said step(b); (d) calculating a total score related to said Web site; (e)performing periodically said steps (a) to (d) for said Web site; and (f)notifying an alarm when said total score related to said Web site orsaid importance value of said part fulfills a predetermined condition44. A method of checking a hypertext comprising the steps of: (a)collecting information about links in a Web site; (b) detecting a partincluding a logically mismatched link by referring to said informationcollected in said step (a); (c) calculating importance value of saidpart detected in said step (b); (d) calculating a total score related tosaid Web site; (e) performing said steps (a) to (d) for a plurality ofWeb sites specified as targets; and (f) outputting said total scores ofsaid plurality of Web sites as a ranking list.
 45. A computer programproduct comprising a computer usable storage medium having computerreadable code embodied therein, said computer readable code beingexecuted by a computer including an information storing unit whichstores an information about links related to a hypertext, said computerreadable code including a cord for having said computer serve as acondition detecting unit which refers to said information storing unitto detect a part including a logically mismatched link.
 46. A computerprogram product comprising a computer usable storage medium havingcomputer readable code embodied therein, said computer readable codebeing executed by a computer having an information storing unit, saidcomputer readable code including a cord for having said computer serveas: an information collecting unit which collects an information aboutlinks related to a hypertext and stores said information on saidinformation storing unit; and a condition detecting unit which refers tosaid information storing unit to detect a part including a logicallymismatched link.
 47. The computer program product as set forth in claim46, wherein said computer readable code includes a cord for having saidcomputer serve as a candidate providing unit which provides a correctioncandidate related to said part including the logically mismatched linkdetected by said condition detecting unit.
 48. The computer programproduct as set forth in claim 47, wherein said computer readable codeincludes a cord for having said computer serve as an importancecalculating unit which calculates importance value of said partincluding the logically mismatched link detected by said conditiondetecting unit.
 49. The computer program product as set forth in claim47, wherein said computer readable code includes a cord for having saidcomputer serve as a correction reflecting unit which corrects saidhypertext based on said part including the logically mismatched linkdetected by said condition detecting unit and said correction candidateprovided by said correction providing unit.
 50. The computer programproduct as set forth in claim 48, wherein said computer readable codeincludes a cord for having said computer serve as a total scorecalculating unit which calculates a total score related to saidhypertext based on at least one of factors, said factors including theimportance value calculated by said importance calculating unit, thenumber of said parts detected by said condition detecting unit, and therate of the number of said part detected by said condition detectingunit corresponding to the total number of the links.
 51. The computerprogram product as set forth in claim 45, wherein said computer readablecode includes a cord for having said computer serve as an importancecalculating unit which calculates the importance value of the partincluding the logically mismatched link detected by said conditiondetecting unit.
 52. The computer program product as set forth in claim51, wherein said computer readable code includes a cord for having saidcomputer serve as a total score calculating unit which calculates atotal score related to said hypertext based on at least one of factors,said factors including the importance value calculated by saidimportance calculating unit, the number of said parts detected by saidcondition detecting unit, and the rate of the number of said partdetected by said condition detecting unit corresponding to the totalnumber of the links.
 53. The computer program product as set forth inclaim 45, wherein said condition detecting unit is operated to dividesaid information about the links into some groups in accordance with apredetermined condition and detects a minor group as said part includingthe logically mismatched link.
 54. The computer program product as setforth in claim 45, said condition detecting unit is operated to detect apart including a link of which a link source description and contents ofthe link target page are mismatched as said part including the logicallymismatched link.
 55. The computer program product as set forth in claim45, wherein said condition detecting unit is operated to calculatecriteria scores of the links based on at least one of the followingscores and detects a link with a high criteria scores as said part, saidscores including: (1) a first score calculated by comparing link sourcedescriptions of a plurality of links having a same link target page witheach other; (2) a second score calculated by comparing link target pagesof a plurality of links having a same link source description with eachother; (3) a third score calculated by comparing link target pages of aplurality of links having a same link target page and a same link sourcedescription with each other; and (4) a fourth score calculated bycomparing contents of a link source description and contents of a linktarget page, said link source description being linked with said linktarget page.
 56. The computer program product as set forth in claim 45,wherein said condition detecting unit is operated to detect a parthaving a mismatch between a link source description and contents of thelink target page, said link source description being linked with saidlink target page, and said mismatch being caused by changing thecontents of said link target page.
 57. The computer program product asset forth in claim 45, wherein said condition detecting unit is operatedto calculate criteria scores of the links based on at least one of thefollowing scores and to detect a link with a high criteria scores assaid part, said scores including: (1) a first score calculated bycomparing link source descriptions of a plurality of links having a samelink target page with each other; (2) a second score calculated bydetecting a notice description including a movement notice descriptionor an expiration notice description included in the contents of a linktarget page; and (3) a third score calculated by detecting a descriptionof period of validity included in the contents of a link target page andcomparing said period of validity and present date and time.
 58. Thecomputer program product as set forth in claim 45, wherein saidcondition detecting unit is operated to detect a part having a disunityamong a plurality of link source descriptions having a same link targetpage.
 59. The computer program product as set forth in claim 45, whereinsaid condition detecting unit is operated to detect a part having adisunity in styles among a plurality of link source descriptionsincluded in a same page or peripheral pages.
 60. The computer programproduct as set forth in claim 47, wherein said condition detecting unitis operated to divide said information about the links into some groupsincluding a major group and a minor group in accordance with apredetermined condition and detects said minor group as said partincluding the logically mismatched link.
 61. The computer programproduct as set forth in claim 60, wherein said candidate providing unitis operated to provide a correction candidate that makes said minorgroup same as said main group.
 62. The computer program product as setforth in claim 47, wherein said condition detecting unit is operated todetect a part including a link of which a link source description andcontents of a link target page are mismatched as said part including thelogically mismatched link.
 63. The computer program product as set forthin claim 47, wherein said condition detecting unit is operated tocalculate criteria scores of the links based on at least one of thefollowing scores and detects a link with a high criteria scores as saidpart, said scores including: (1) a first score calculated by comparingthe link source descriptions of a plurality of links having a same linktarget page with each other; (2) a second score calculated by comparingthe target pages of a plurality of links having a same link sourcedescription with each other; (3) a third score calculated by comparingthe link target pages of a plurality of links having a same link targetpage and a same link source description with each other; and (4) afourth score calculated by comparing contents of a link sourcedescription and contents of a link target page, said link sourcedescription being linked with said link target page.
 64. The computerprogram product as set forth in claim 63, wherein said candidateproviding unit is operated to provide at least one of the followingcorrection candidates, said correction candidates including: (1) a firstcorrection candidate for the link source description obtained bycomparing the link source descriptions of a plurality of links having asame link target page with each other; (2) a second correction for thelink target candidate obtained by comparing target pages of a pluralityof links having a same link source description with each other; (3) acorrection candidate for the link target obtained by comparing linktarget pages of a plurality of links having a same link target page anda same link source description with each other; and (4) a correctioncandidate for the link source description obtained by comparing contentsof a link source description and contents of a link target page, saidlink source description being linked with said link target page.
 65. Thecomputer program product as set forth in claim 47, wherein saidcondition detecting unit is operated to detect a part having a mismatchbetween a link source description and contents of a link target page,said link source description being linked with said link target page,and said mismatch being caused by changing the contents of said linktarget page.
 66. The computer program product as set forth in claim 47,wherein said condition detecting unit is operated to calculate criteriascores of the links based on at least one of the following scores anddetects a link with a high criteria scores as said part, said scoresincluding: (1) a first score calculated by comparing link sourcedescriptions of a plurality of links having a same link target page witheach other; (2) a second score calculated by detecting a noticedescription including a movement notice description or an expirationnotice description included in the contents of a link target page; and(3) a third score calculated by detecting a description of period ofvalidity included in the contents of a link target page and comparingsaid period of validity and present date and time.
 67. The computerprogram product as set forth in claim 66, wherein said candidateproviding unit is operated to provide at least one of the followingcorrection candidates, said correction candidates including: (1) a firstcorrection candidate for the link source description obtained bycomparing link source descriptions of a plurality of links having a samelink target page with each other; and (2) a second correction candidatefor the link target obtained by extracting the description of new movedaddress from the contents of a link target page.
 68. The computerprogram product as set forth in claim 47, wherein said conditiondetecting unit is operated to detect a part having a disunity among aplurality of link source descriptions having a same link target page,and said candidate providing unit provides a correction candidate forthe link source description by comparing link source descriptions of aplurality of links having a same link target page as that of said partdetected by said condition detecting unit.
 69. The computer programproduct as set forth in claim 47, wherein said condition detecting unitis operated to detect a part having a disunity in styles among aplurality of link source descriptions included in a same page orperipheral pages, and said candidate providing unit is operated toprovide said correction candidate for the style of the link sourcedescription by comparing the style of a plurality of link sourcedescriptions included in the detected part detected by said conditiondetecting unit.
 70. The computer program product as set forth in any oneof claim 46, wherein said information collecting unit is operated torepeatedly collect said information about the links in the hypertext,and said information storing unit stores a plurality of said informationabout the links collected at a plurality of different times.
 71. Thecomputer program product as set forth in claim 70, wherein saidcondition detecting unit is operated to detect a part having a mismatchbetween a link source description and contents of a link target page byreferring to said information storing unit and calculating changes ofthe numbers of the links or kinds of the link source description to thelink target page during said times, the contents of said link targetpage being changed.
 72. The computer program product as set forth inclaim 45, wherein said condition detecting unit is operated to detect alink having no link source description as said part including thelogically mismatched link.
 73. The computer program product as set forthin claim 45, wherein said condition detecting unit is operated to detecta link having the link source description in which no character stringsor images are included, or a link having the link source description inwhich a character string or an image expressed in an inconspicuous coloror a size is included, as said part including the logically mismatchedlink.
 74. The computer program product as set forth in claim 45, whereinsaid condition detecting unit is operated to detect a group of linksforming a loop as said part, the link source descriptions of said linksrelating to a same topic.
 75. The computer program product as set forthin claim 48, wherein said importance calculating unit is operated tocalculate importance value based on at least one of the followingfactors including: (1) a sort of errors or unsuitability of the detectedpart detected by said condition detecting unit; (2) accuracy of errorsor unsuitability of said detected part; (3) the number of links which isconnected to the page including said detected part; (4) a record offrequency of access to the page including said detected part; and (5) astratification level in the hypertext of the page including saiddetected part.
 76. The computer program product as set forth in claim48, wherein said importance calculating unit is operated to calculatethe importance value of the detected part detected by said conditiondetecting unit, and to control output condition for said detected partin accordance with said importance value, said output conditionincluding the number of outputting said detected part or a method ofoutputting said detected part.
 77. The computer program product as setforth in claim 46, wherein said information collecting unit is operatedto extract character strings corresponding to the link sourcedescription by character recognition when the link source description isan image, and to resister said extracted character strings as saidinformation about links on said information storing unit.
 78. Thecomputer program product as set forth in claim 45, having a hypertext ona Web site to be checked target.
 79. The computer program product as setforth in claim 46, having a hypertext on a Web site to be checkedtarget.